How To Choose Subsets To Form A Matrix Of Combinations - optimization

I have created (after a few months of learning VBA and then switching to Python) a script that will backtrack and select the combinations of a number, n, in an order that is equivalent to the combinations of n of a higher selection, k.
Or another way of saying it: I create a matrix of 5 supersets by doing 5C4. I create a list of 10 subsets by doing 5C2. I then use backtracking to choose each subset to form the supersets.
Example using [A, B, C, D, E]:
5C4 = 5 supersets -> [ABCD,
ABCE,
ABDE,
ACDE,
BCDE]
Likewise, 5C2 = 10 subsets -> [AB,
AC,
AD,
AE,
BC,
BD,
BE,
CD,
CE,
DE]
Finally, 5(subsets)*2(groups)=10(supersets) results into something like:
[[(A, B), (C, D)]
[(A, C), (D, E)]
[(A, D), (B, E)]
[(A, E), (B, C)]
[(B, D), (C, E)]]
granted there are a few different matrix solutions that could arise based on the order you have the list of 5C2.. but I have spent way toooo much time on figuring out the correct/general way of doing this, which seems like unnecessary information. I have also gone down the rabbit hole of making sure that I can arrange the subsets in a way that each element of result[row][col][element] appears in the col a specific number of times. And after I find a faster way of selecting subsets, I will reorder them to do this (as a bonus if anyone knows how to do this; I am thinking permutations). Did you know that you can get occurrence of each element from Pascal's triangle... But I digress.
Backtracking works for smaller numbers, but if I do something larger, let's say 17C15(superset) = 17C3(subset)*5(groups), the code runs too long (stopping it after +40h).
In these examples here is the logic of why I think this should work:
5C2 = 10
5C4 = 5
5C4 * 2(groups) = 10
17C3 = 680
17C15 = 136
17C15 * 5(groups) = 680
1.) Is my logic correct? I considered that when choosing an n, k_1 < (n/2), k_2 > (n/2), and k_1|k_2 (k_2 is a multiple of k_1).
2.) If this is the most practical way of choosing subsets, then can anyone help optimize my code?
from itertools import combinations, chain
families = range(1, 6)
choose = 2
groups = 2
size = int(choose*groups)
teams = list(combinations(families, choose))
cluster = list(combinations(families, size))
team_len = int(len(teams)/groups)
zero_set = [(0) for _ in range(choose)]
def check(grid, row, column, subset):
# check if combo(subset) is a subset of cluster's row
if ((set(subset) & set(cluster[row])) != set(subset)):
return False
check = set(x for x in chain(*grid[row]))
if(set(tuple(x) for x in grid[row]).intersection(set(subset))):
return False
if(check.intersection(set(subset))):
return False
for x in range(team_len):
for y in range(groups):
# check if used already
if (grid[x][y] == subset):
return False
return True
def solve(grid, row, column):
if(row == team_len and column == groups - 1): return True
if(row == team_len):
column += 1
row = 0
if grid[row][column] > zero_set:
return solve(grid, row, column + 1)
for subset in teams:
if check(grid, row, column, subset):
grid[row][column] = subset
if solve(grid, row + 1, column):
return True
grid[row][column] = zero_set
return False
grid = [[zero_set]*groups for _ in range(team_len)]
if solve(grid, 0, 0):
print("success")
for x in grid:
print(x)
print("finished")
This returns:
success
[(1, 2), (3, 4)]
[(1, 3), (2, 5)]
[(1, 5), (2, 4)]
[(1, 4), (3, 5)]
[(2, 3), (4, 5)]
finished
Success criteria would be to have all 680 combinations used once (no repetitions) in the final matrix, grid.
families = range(1, 18)
choose = 3
groups = 5

Related

Find pairs of array such as array_1 = -array_2

I search a way to find all the vector from a np.meshgrid(xrange, xrange, xrange) that are related by k = -k.
For the moment I do that :
#numba.njit
def find_pairs(array):
boolean = np.ones(len(array), dtype=np.bool_)
pairs = []
idx = [i for i in range(len(array))]
while len(idx) > 1:
e1 = idx[0]
for e2 in idx:
if (array[e1] == -array[e2]).all():
boolean[e2] = False
pairs.append([e1, e2])
idx.remove(e1)
if e2 != e1:
idx.remove(e2)
break
return boolean, pairs
# Give array of 3D vectors
krange = np.fft.fftfreq(N)
comb_array = np.array(np.meshgrid(krange, krange, krange)).T.reshape(-1, 3)
# Take idx of the pairs k, -k vector and boolean selection that give position of -k vectors
boolean, pairs = find_pairs(array)
It works but the execution time grow rapidly with N...
Maybe someone has already deal with that?
The main problem is that comb_array has a shape of (R, 3) where R = N**3 and the nested loop in find_pairs runs at least in quadratic time since idx.remove runs in linear time and is called in the for loop. Moreover, there are cases where the for loop does not change the size of idx and the loop appear to run forever (eg. with N=4).
One solution to solve this problem in O(R log R) is to sort the array and then check for opposite values in linear time:
import numpy as np
import numba as nb
# Give array of 3D vectors
krange = np.fft.fftfreq(N)
comb_array = np.array(np.meshgrid(krange, krange, krange)).T.reshape(-1, 3)
# Sorting
packed = comb_array.view([('x', 'f8'), ('y', 'f8'), ('z', 'f8')])
idx = np.argsort(packed, axis=0).ravel()
sorted_comb = comb_array[idx]
# Find pairs
#nb.njit
def findPairs(sorted_comb, idx):
n = idx.size
boolean = np.zeros(n, dtype=np.bool_)
pairs = []
cur = n-1
for i in range(n):
while cur >= i:
if np.all(sorted_comb[i] == -sorted_comb[cur]):
boolean[idx[i]] = True
pairs.append([idx[i], idx[cur]])
cur -= 1
break
cur -= 1
return boolean, pairs
findPairs(sorted_comb, idx)
Note that the algorithm assume that for each row, there are only up to one valid matching pair. If there are several equal rows, they are paired 2 by two. If your goal is to extract all the combination of equal rows in this case, then please note that the output will grow exponentially (which is not reasonable IMHO).
This solution is pretty fast even for N = 100. Most of the time is spent in the sort that is not very efficient (unfortunately Numpy does not provide a way to do a lexicographic argsort of the row efficiently yet though this operation is fundamentally expensive).

About the numpy.where statement

I would like to use the numpy.where to check the value of a previous row but don't know how to code
for n1 in range(len(image1)):
print('input image ',input_folder+'\\' + image1[n1])
print('\n')
print('image1[n1] ',image1[n1])
print('\n')
im = Image.open(input_folder+'\\'+image1[n1])
a = np.array(im, dtype='uint8')
width, height = im.size
print('width ',width)
print('height ',height)
a = np.where(a==[0,0,0],[255,255,255],a)
!-- Change the looping statement to np.where --!
for h in range(height):
for w in range(width):
if h <= (height - 2) and w <= (width - 2):
if a[h,w,0] != 255 and a[h,w,1] != 255 and a[h,w,2] != 255:
if (a[h-1,w,0] == 255 and a[h-1,w,1] == 255 and a[h-1,w,2] == 255 and a[h+1,w,0] == 255 and a[h+1,w,1] == 255 and a[h+1,w,2] == 255) or (a[h,w-1,0] == 255 and a[h,w-1,1] == 255 and a[h,w-1,2] == 255 and a[h,w+1,0] == 255 and a[h,w+1,1] == 255 and a[h,w+1,2] == 255):***
Change the above looping statement to np.where(a[-??] = [255,255,255] or a[+??] = [255,255,255]) so it can run more faster than the for loop statement. -->
a[h,w,0] = 255
a[h,w,1] = 255
a[h,w,2] = 255
I'm afraid, you can not use np.where here.
The reason is that:
the condition passed to np.where should indicate each element of the
source array,
whereas the criterion in your code actually relates only to first 2
dimensions of the source array.
So I came up with another, quite elegant and concise solution.
Part 1: How to get first two indices of elements, where all elements
in the third dimension are != 255:
To to it, on the whole array, you could run:
np.not_equal(a, 255).all(axis=2)
Part 2: How to limit the "range of operation" to elements having both
previous and next row and column.
You can do it passing to the above code a "subrange" of the original array:
np.not_equal(a[1:-1, 1:-1], 255).all(axis=2))
You should eliminate both the first and the last column and row (in
your code you failed to eliminate the first row / column).
But note that this time the resulting indices are by one less than before,
so at the later step you will have to add 1 to them.
Part 3: A function to check whether all elements along the third dimension
== 255, for some row (r) and column (c):
def all_eq(arr, r, c):
return np.equal(arr[r, c], 255).all()
(will be used soon).
Part 4: How to get the result:
res = a.copy()
for r, c in zip(*np.where(np.not_equal(a[1:-1, 1:-1], 255).all(axis=2))):
h = r + 1
w = c + 1
if all_eq(a, h-1, w) and all_eq(a, h+1, w) or\
all_eq(a, h, w-1) and all_eq(a, h, w+1):
res[h, w] = 255
Note that this code starts from making a copy of the original array
(it will hold the result).
Then, for r, c in zip(…) iterates over the indices found.
First 2 lines in the loop add 1 to the indices found, in the subrange
of the original array, so now h and w indicate row / column in the whole
original array.
Then if checks whether the respective adjacent pixels have 255 in all elements.
If they do, then put 255 in all elements of the "current" pixel, in the result.
You can't operate on the original array, since changed values in some pixels
would "falsify" the evaluation of conditions for subseqent pixels.
Edit
After some research I found, that it is possible to use np.where,
although the solution is a bit complicated and involving quite a big
number of Numpy methods:
# Mask 1: Pixels with all elements != 255
m1 = np.zeros((height, width), dtype='int8')
idx = np.where(np.not_equal(a, 255).all(axis=2))
m1[idx] = 1
# Pixels with all elements == 255
m2 = np.apply_along_axis(lambda px: np.equal(px, 255).all(), 2, a).astype('int8')
# Both adjacent pixels (left / right) == 255
m2a = np.logical_and(np.insert(m2, 0, 0, axis=1)[:,:-1], np.insert(m2,
width, 0, axis=1)[:,1:])
# Both adjacent pixels (up / down) == 255
m2b = np.logical_and(np.insert(m2, 0, 0, axis=0)[:-1,:], np.insert(m2,
height, 0, axis=0)[1:,:])
# Mask 2: Both adjacent pixels (either vertically or horizontally) == 255
m2 = np.logical_or(m2a, m2b)
# The "final" mask
msk = np.logical_and(m1, m2)
# Generate the result
result = np.where(np.expand_dims(msk, 2), 255, a)
This solution should be substantially faster than my first concept.

batched tensor slice, slice B x N x M with B x 1

I have an B x M x N tensor, X, and I have and B x 1 tensor, Y, which corresponds to the index of tensor X at dimension=1 that I want to keep. What is the shorthand for this slice so that I can avoid a loop?
Essentially I want to do this:
Z = torch.zeros(B,N)
for i in range(B):
Z[i] = X[i][Y[i]]
the following code is similar to the code in the loop. the difference is that instead of sequentially indexing the array Z,X and Y we are indexing them in parallel using the array i
B, M, N = 13, 7, 19
X = np.random.randint(100, size= [B,M,N])
Y = np.random.randint(M , size= [B,1])
Z = np.random.randint(100, size= [B,N])
i = np.arange(B)
Y = Y.ravel() # reducing array to rank-1, for easy indexing
Z[i] = X[i,Y[i],:]
this code can be further simplified as
-> Z[i] = X[i,Y[i],:]
-> Z[i] = X[i,Y[i]]
-> Z[i] = X[i,Y]
-> Z = X[i,Y]
pytorch equivalent code
B, M, N = 5, 7, 3
X = torch.randint(100, size= [B,M,N])
Y = torch.randint(M , size= [B,1])
Z = torch.randint(100, size= [B,N])
i = torch.arange(B)
Y = Y.ravel()
Z = X[i,Y]
The answer provided by #Hammad is short and perfect for the job. Here's an alternative solution if you're interested in using some less known Pytorch built-ins. We will use torch.gather (similarly you can achieve this with numpy.take).
The idea behind torch.gather is to construct a new tensor-based on two identically shaped tensors containing the indices (here ~ Y) and the values (here ~ X).
The operation performed is Z[i][j][k] = X[i][Y[i][j][k]][k].
Since X's shape is (B, M, N) and Y shape is (B, 1) we are looking to fill in the blanks inside Y such that Y's shape becomes (B, 1, N).
This can be achieved with some axis manipulation:
>>> Y.expand(-1, N)[:, None] # expand to dim=1 to N and unsqueeze dim=1
The actual call to torch.gather will be:
>>> X.gather(dim=1, index=Y.expand(-1, N)[:, None])
Which you can reshape to (B, N) by adding in [:, 0].
This function can be very effective in tricky scenarios...

In Tensorflow, is there a built in function to compute states over time given a transition matrix?

I have a system given by this recursive relationship: xt = At xt-1 + bt. I wish to compute xt for all t, with At, bt and x0 given. Is there are built-in function for that? If I use a loop it would be extremely slow. Thanks!
There is sort of a way. Let's say you have your A matrices in a 3D tensor with shape (T, N, N), where T is the total number of time steps and N is the size of your vector. Similarly, B values are in a 2D tensor (T, N). The first step in the computation would be:
x1 = A[0] # x0 + B[0]
Where # represents matrix product. But you can convert this into a single matrix product. Suppose we add a value 1 at the end of x0, and we call that x0p (for prime):
x0p = tf.concat([x, [1]], axis=0)
And now we build a new 3D tensor Ap with shape (T, N+1, N+1), such that for each A[i] we concatenate B[i] as a new column, and then we add a row with N zeros and a single one at the end:
AwithB = tf.concat([tf.concat([A, tf.expand_dims(B, 2)], axis=2)], axis=1)
AnewRow = tf.concat([tf.zeros((T, 1, N), A.dtype), tf.ones((T, 1, 1), A.dtype)], axis=2)
Ap = tf.concat([AwithB, AnewRow], axis=1)
As it turns out, you can now say:
x1p = Ap[0] # x0p
And therefore:
x2p = Ap[1] # x1p = Ap[1] # Ap[0] # x0p
So we just need to compute all the matrix product of all matrices in Ap across the first dimension. Unfortunately, there does not seem to be a direct operation to compute that with TensorFlow, but you can do it relatively fast with tf.scan:
Ap_prod = tf.scan(tf.matmul, Ap)[-1]
And with that you just have to do:
xtp = Ap_prod # x0p
Here is a proof of concept (the code is tweaked to support single examples and batches, either in the A and B values or in the x)
import tensorflow as tf
def compute_state(a, b, x):
s = tf.shape(a)
t = s[-3]
n = s[-1]
# Add final 1 to x
xp = tf.concat([x, tf.ones_like(x[..., :1])], axis=-1)
# Add B column to A
a_b = tf.concat([tf.concat([a, tf.expand_dims(b, axis=-1)], axis=-1)], axis=-2)
# Make new final row for A
a_row = tf.concat([tf.zeros_like(a[..., :1, :]),
tf.ones_like(a[..., :1, :1])], axis=-1)
# Add new row to A
ap = tf.concat([a_b, a_row], axis=-2)
# Compute matrix product reduction
ap_prod = tf.scan(tf.matmul, ap)[..., -1, :, :]
# Compute final result
outp = tf.linalg.matvec(ap_prod, xp)
return outp[..., :-1]
#Test
tf.random.set_seed(0)
a = tf.random.uniform((10, 5, 5), -1, 1)
b = tf.random.uniform((10, 5), -1, 1)
x = tf.random.uniform((5,), -1, 1)
y = compute_state(a, b, x)
# Also works with batches of (a, b) or x
a = tf.random.uniform((100, 10, 5, 5), -1, 1)
b = tf.random.uniform((100, 10, 5), -1, 1)
x = tf.random.uniform((100, 5), -1, 1)
y = compute_state(a, b, x)

Iterator with memory?

I'm working on an application which use a markov chain.
An example on this code follows:
chain = MarkovChain(order=1)
train_seq = ["","hello","this","is","a","beautiful","world"]
for i, word in enum(train_seq):
chain.train(previous_state=train_seq[i-1],next_state=word)
What I am looking for is to iterate over train_seq, but to keep the N last elements.
for states in unknown(train_seq,order=1):
# states should be a list of states, with states[-1] the newest word,
# and states[:-1] should be the previous occurrences of the iteration.
chain.train(*states)
Hope the description of my problem is clear enough for
window will give you n items from iterable at a time.
from collections import deque
def window(iterable, n=3):
it = iter(iterable)
d = deque(maxlen = n)
for elem in it:
d.append(elem)
yield tuple(d)
print [x for x in window([1, 2, 3, 4, 5])]
# [(1,), (1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]
If you want the same number of items even the first few times,
from collections import deque
from itertools import islice
def window(iterable, n=3):
it = iter(iterable)
d = deque((next(it) for Null in range(n-1)), n)
for elem in it:
d.append(elem)
yield tuple(d)
print [x for x in window([1, 2, 3, 4, 5])]
will do that.
seq = [1,2,3,4,5,6,7]
for w in zip(seq, seq[1:]):
print w
You can also do the following to create an arbitrarily-sized pairs:
tuple_size = 2
for w in zip(*(seq[i:] for i in range(tuple_size)))
print w
edit: But it's probably better using the iterative zip:
from itertools import izip
tuple_size = 4
for w in izip(*(seq[i:] for i in range(tuple_size)))
print w
I tried this on my system with seq being 10,000,000 integers and the results were fairly instant.
Improving upon yan's answer to avoid copies:
from itertools import *
def staggered_iterators(sequence, count):
iterator = iter(sequence)
for i in xrange(count):
result, iterator = tee(iterator)
yield result
next(iterator)
tuple_size = 4
for w in izip(*(i for i in takewhile(staggered_iterators(seq, order)))):
print w