Generating all unique crossword puzzle grids - crossword

I want to generate all unique crossword puzzle grids of a certain grid size (4x4 is a good size). All possible puzzles, including non-unique puzzles, are represented by a binary string with the length of the grid area (16 in the case of 4x4), so all possible 4x4 puzzles are represented by the binary forms of all numbers in the range 0 to 2^16.
Generating these is easy, but I'm curious if anyone has a good solution for how to programmatically eliminate invalid and duplicate cases. For example, all puzzles with a single column or single row are functionally identical, hence eliminating 7 of those 8 cases. Also, according to crossword puzzle conventions, all squares must be contiguous. I've had success removing all duplicate structures, but my solution took several minutes to execute and probably was not ideal. I'm at something of a loss for how to detect contiguity so if anyone has ideas on this it'd be much appreciated.
I'd prefer solutions in python but write in whichever language you prefer. If anyone wants, I can post my python code for generating all grids and removing duplicates, slow as it may be.

Disclaimer: mostly untested other than all tests do have an impact by filtering out some grids and a few spotted errors were fixed. Can certainly be optimized.
def is_valid_grid (n):
row_mask = ((1 << n) - 1)
top_row = row_mask << n * (n - 1)
left_column = 0
right_column = 0
for row in range (n):
left_column |= (1 << (n - 1)) << row * n
right_column |= 1 << row * n
def neighborhood (grid):
return (((grid & ~left_column) << 1)
| ((grid & ~right_column) >> 1)
| ((grid & ~top_row) << n)
| (grid >> n))
def is_contiguous (grid):
# Start with a single bit and expand with neighbors as long as
# possible. If we arrive at the starting grid then it is
# contiguous, else not.
part = (grid ^ (grid & (grid - 1)))
while True:
expanded = (part | (neighborhood (part) & grid))
if expanded != part:
part = expanded
else:
break
return part == grid
def flip_y (grid):
rows = []
for k in range (n):
rows.append (grid & row_mask)
grid >>= n
for row in rows:
grid = (grid << n) | row
return grid
def rotate (grid):
rotated = 0
for x in range (n):
for y in range (n):
if grid & (1 << (n * y + x)):
rotated |= (1 << (n * x + (n - 1 - y)))
return rotated
def transform (grid):
yield flip_y (grid)
for k in range (3):
grid = rotate (grid)
yield grid
yield flip_y (grid)
def do_is_valid_grid (grid):
# Any square in the topmost row?
if not (grid & top_row):
return False
# Any square in the leftmost column?
if not (grid & left_column):
return False
# Is contiguous?
if not is_contiguous (grid):
return False
# Of all transformations, we pick only that which gives the
# smallest number.
for transformation in transform (grid):
# A transformation can produce a grid without a square in the topmost row and/or leftmost column.
while not (transformation & top_row):
transformation <<= n
while not (transformation & left_column):
transformation <<= 1
if transformation < grid:
return False
return True
return do_is_valid_grid
def valid_grids (n):
do_is_valid_grid = is_valid_grid (n)
for grid in range (2 ** (n * n)):
if do_is_valid_grid (grid):
yield grid
for grid in valid_grids (4):
print grid

Related

What is a time complexity of the following algorithm in Big Theta Notation?

res = 0
for i in range (1,n):
j = i
while j % 2 == 0:
j = j/2
res = res + j
I understand that upper bound is O(nlogn), however I'm wondering if it's possible to find a stronger constraint? I'm stuck with the analysis.
Some ideas that may be helpful:
Could create a function (g(n)) that annotates your function (f(n)) to include how many operations occur when running f(n)
def f(n):
res = 0
for i in range (1,n):
j = i
while j % 2 == 0:
j = j/2
res = res + j
return res
def g(n):
comparisons = 0
operations = 0
assignments = 0
assignments += 1
res = 0
assignments += 1. # i = 1
comparisons += 1. # i < n
for i in range (1,n):
assignments += 1
j = i
operations += 1
comparisons += 1
while j % 2 == 0:
operations += 1
assignments += 1
j = j/2
operations += 1
assignments += 1
res = res + j
operations += 1
comparisons += 1
operations += 1 # i + 1
assignments += 1 # assign to i
comparisons += 1 # i < n ?
return operations + comparisons + assignments
For n = 1, the code runs without hitting any loops: assigning the value of res; assigning i as 1; comparing i to n and skipping the loop as a result.
For n > 1, you get into the for loop, and the for statement is all that is changing the loop varaible, so the complexity of the rest of the code is at least O(n).
Once in the loop:
if i is odd, then you only assign j, perform the mod operation and compare to zero. That will be the case for half the values of i, so each run of the loop from 2 to n will (half the time) add a fixed number of a few operations (including the loop operations). So, that's still O(n), just with a larger constant.
if i is even, then we divide by 2 until it is odd. This is what we need to work out the impact of.
Based on my counting of the different operations, I get:
g_initial_setup = 3 (every time)
g_for_any_i = 6 (half the time, it is just this)
g_for_even_i = 6 for each time we divide by two (the other half of the time)
For a random even i between 2 and n, half the time we will only need to divide by two once, half the remaining time by two again, half the remaining time by two again, etc. So we have an infinite series as n goes to infinity of sum(1/2^i) for 1 < i < n, and multiply that by the 6 operations done for each halving of j.
I would expect from this:
g(n) = 3 + (n * 6) + (n * 6) * sum( 1 / pow(2,m) for m between 1 and n )
Given that the infinite series 1/2^n = 1, we simplify that to:
g(n) = 3 + 12n as n approaches infinity.
That implies that the algorithm is O(n). Huh. I did not expect that.
Let's try out the function g(n) from above, counting all the operations that are occurring as f(n) is computed.
g(1) = 3 operations
g(2) = 9
g(3) = 21
g(4) = 27
g(5) = 45
g(10) = 123
g(100) = 1167
g(1000) = 11943
g(10000) = 119943
g(100000) = 1199931
g(1000000) = 11999919
g(10000000) = 119999907
Okay, unless I've really made a serious error here, it's O(n).

How to calculate the number of scatterplot data points in a particular 'region' of the graph

As my questions says I'm trying to find a way to calculate the number of scatterplot data points (pink dots) in a particular 'region' of the graph or either side of the black lines/boundaries. Open to any ideas as I don't even know where to start. Thank you!!
The code:
################################
############ GES ##############
################################
p = fits.open('GES_DR17.fits')
pfeh = p[1].data['Fe_H']
pmgfe = p[1].data['Mg_Fe']
pmnfe = p[1].data['Mn_Fe']
palfe = p[1].data['Al_Fe']
#Calculate [(MgMn]
pmgmn = pmgfe - pmnfe
ax1a.scatter(palfe, pmgmn, c='thistle', marker='.',alpha=0.8,s=500,edgecolors='black',lw=0.3, vmin=-2.5, vmax=0.65)
ax1a.plot([-1,-0.07],[0.25,0.25], c='black')
ax1a.plot([-0.07,1.0],[0.25,0.25], '--', c='black')
x = np.arange(-0.15,0.4,0.01)
ax1a.plot(x,4.25*x+0.8875, 'k', c='black')
Let's call the two axes x and y. Any line in this plot can be written as
a*x + b*y + c = 0
for some value of a,b,c. But if we plug in a points with coordinates (x,y) in to the left hand side of the equation above we get positive value for all points of the one side of the line, and a negative value for the points on the other side of the line. So If you have multiple regions delimited by lines you can just check the signs. With this you can create a boolean mask for each region, and just count the number of Trues by using np.sum.
# assign the coordinates to the variables x and y as numpy arrays
x = ...
y = ...
line1 = a1*x + b1*y + c1
line2 = a2*x + b2*y + c2
mask = (line1 > 0) & (line2 < 0) # just an example, signs might vary
count = np.sum(mask)

Maximizing with constraint for number of distinct SKU not greater than X

I'm building a optimization tool using Pulp.
It's purpose is to define which SKU to take and which SKU to leave from each warehouse.
I'm having trouble with the following constraint:
"The maximum of different SKUs selected should not exceed 500"
That is to say, that no matter how many units you take, as long as they do not exceed 500 varieties (different SKUs), its all good.
This is what I've got so far
#simplex
df=pd.read_excel(ruta+"actual/202109.xlsx", nrows=20) #leemos la nueva base del mes
# Create variables and model
x = pulp.LpVariable.dicts("x", df.index, lowBound=0)
mod = pulp.LpProblem("Budget", pulp.LpMaximize)
# Objective function
objvals = {idx: (1.0)*(df['costo_unitario'][idx]) for idx in df.index}
mod += sum([x[idx]*objvals[idx] for idx in df.index])
# Lower and upper bounds:
for idx in df.index:
mod += x[idx] <= df['unidades_sobrestock'][idx]
# Budget sum
mod += sum([x[idx] for idx in df.index]) <= max_uni
# Solve model
mod.solve()
# Output solution
for idx in df.index:
print (str(idx) + " " + str(x[idx].value()))
print ('Objective' + " " + str(pulp.value(mod.objective)))
In the same dataframe, I have a column with the SKU of each particular row df['SKU']
I'm imagining that the constraint should look something like:
for idx in df.index:
mod += df['SKU'].count(distinct) <= 500
but that doesn't seem to work.
Thanks!
You will need a binary variable y[i] to indicate if a SKU is used. In math-like notation:
x[i] ≤ maxx[i]*y[i] (y[i] = 0 ==> x[i] = 0)
sum(i, y[i]) ≤ maxy (limit number of different SKUs)
y[i] ∈ {0,1} (binary variable)
where
maxx[i] = upperbound on x[i]
maxy = limit on number of different SKUs

Iterating over multidimensional Numpy array

What is the fastest way to iterate over all elements in a 3D NumPy array? If array.shape = (r,c,z), there must be something faster than this:
x = np.asarray(range(12)).reshape((1,4,3))
#function that sums nearest neighbor values
x = np.asarray(range(12)).reshape((1, 4,3))
#e is my element location, d is the distance
def nn(arr, e, d=1):
d = e[0]
r = e[1]
c = e[2]
return sum(arr[d,r-1,c-1:c+2]) + sum(arr[d,r+1, c-1:c+2]) + sum(arr[d,r,c-1]) + sum(arr[d,r,c+1])
Instead of creating a nested for loop like the one below to create my values of e to run the function nn for each pixel :
for dim in range(z):
for row in range(r):
for col in range(c):
e = (dim, row, col)
I'd like to vectorize my nn function in a way that extracts location information for each element (e = (0,1,1) for example) and iterates over ALL elements in my matrix without having to manually input each locational value of e OR creating a messy nested for loop. I'm not sure how to apply np.vectorize to this problem. Thanks!
It is easy to vectorize over the d dimension:
def nn(arr, e):
r,c = e # (e[0],e[1])
return np.sum(arr[:,r-1,c-1:c+2],axis=2) + np.sum(arr[:,r+1,c-1:c+2],axis=2) +
np.sum(arr[:,r,c-1],axis=?) + np.sum(arr[:,r,c+1],axis=?)
now just iterate over the row and col dimensions, returning a vector, that is assigned to the appropriate slot in x.
for row in <correct range>:
for col in <correct range>:
x[:,row,col] = nn(data, (row,col))
The next step is to make
rows = [:,None]
cols =
arr[:,rows-1,cols+2] + arr[:,rows,cols+2] etc.
This kind of problem has come up many times, with various descriptions - convolution, smoothing, filtering etc.
We could do some searches to find the best, or it you prefer, we could guide you through the steps.
Converting a nested loop calculation to Numpy for speedup
is a question similar to yours. There's only 2 levels of looping, and sum expression is different, but I think it has the same issues:
for h in xrange(1, height-1):
for w in xrange(1, width-1):
new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] +
t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
Here's what I ended up doing. Since I'm returning the xv vector and slipping it in to the larger 3D array lag, this should speed up the process, right? data is my input dataset.
def nn3d(arr, e):
r,c = e
n = np.copy(arr[:,r-1:r+2,c-1:c+2])
n[:,1,1] = 0
n3d = np.ma.masked_where(n == nodata, n)
xv = np.zeros(arr.shape[0])
for d in range(arr.shape[0]):
if np.ma.count(n3d[d,:,:]) < 2:
element = nodata
else:
element = np.sum(n3d[d,:,:])/(np.ma.count(n3d[d,:,:])-1)
xv[d] = element
return xv
lag = np.zeros(shape = data.shape)
for r in range(1,data.shape[1]-1): #boundary effects
for c in range(1,data.shape[2]-1):
lag[:,r,c] = nn3d(data,(r,c))
What you are looking for is probably array.nditer:
a = np.arange(6).reshape(2,3)
for x in np.nditer(a):
print(x, end=' ')
which prints
0 1 2 3 4 5

How to identify subtriangle within a rectangle given a coordinate in that rectangle

Given a rectangle of width w and height h. and a coordinate x,y in that rectangle I would like to identify which triangle I am within.
i.e. the function should take parameters(x,y) and return a,b,c,d or a zero based number representing that triangle index i.e. (0=A,1=B,2=C,3=D) if they are in that order.
I think this would be something like >= the formula of the red line and >= the formula of the green line?
I'd like to implement this in VB.NET
aboveRed = x*h > y*w;
aboveGreen = (w-x)*h > y*w;
if (aboveRed)
{
if (aboveGreen) return "C"; else return "B";
}
else
{
if (aboveGreen) return "D"; else return "A";
}
Equation of green line: h * x + w * y = h * w
Equation of red line: x * h - y * w = 0
Public Function GetTriangleNumber(ByVal x As Integer, ByVal y As Integer)
As Integer
Dim overGreenLine As Boolean = ((((h * x) + (w * y)) - (h * w)) < 0)
Dim overRedLine As Boolean = (((h * x) - (w * y)) > 0)
If overGreenLine Then
Return IIf(overRedLine, 2, 3)
End If
Return IIf(overRedLine, 1, 0)
End Function
I would consider the angle of the line to the point from the top left and top right corners. If it is less than 45 degrees (adjusting for the base direction of the edge) in both cases then the point is in C. Other combinations will cover the other three triangles.
You don't actually need to calculate inverse trig functions to do this, as the ratio of the lengths of the lines gives you enough information (and sin(45)... or rather sin(pi/4) is a fixed value).