Related
Given I have two arrays, say A (shape K,L,M) and B (shape K,M).
I want to iterate vectorwise and construct an output C (shape equal to A) by running a function f on each input vector a and scalar b and then reassembling it into the output (i.e. for each c = f(a, b) (where a = A[i, :, j], b = B[i, j], c as a)). In this example the vector axis would be a.shape[1], but in general it could be any other also.
After reading the documentation page of nditer, I thought it should be appropriate and elegant to use here, since apparently it can allocate everything for you, allows a separate external loop, and easily allows reassembly of the output.
However, I am unable to even make something as simple as a vector-wise copy (again along axis) of an existing array using nditer work properly. Is what I want to do simply not possible with nditer or am I using it wrong?
def test(arr, offsets, axis=0):
#out = np.zeros_like(arr)
with np.nditer([arr, None], flags=['external_loop'], #[arr, out]
op_flags=[['readonly'], ['writeonly', 'allocate']],
op_axes=[[axis], None], #[[axis], [axis]]
) as ndit:
for i, o in ndit:
print(i.shape, o.shape)
o[...] = i
return ndit.operands[1]
tested = test(xam.data, shifts, axis=1)
print('test output shape', tested2.shape)
>>> (<L>,) (<L>,)
>>> test output shape (<L>,)
This gives an output only of the very first input. Even if I explicitly give an output that has the same shape as input (e.g. via the commented out changes), then the nditer only runs once on the very first length L vector.
>>> (<L>,) (<L>,)
>>> test output shape (<N>, <L>, <M>)
I have made an alternative version using rollaxis views, but it is not particularly pretty or intuitive, so I was wondering if it should not also be possible with nditer, somehow...
def test2(arr, offsets, axis=0):
arr_r = np.rollaxis(arr, axis).reshape((arr.shape[axis], -1)).T
out = np.zeros_like(arr)
out_r = np.rollaxis(out, axis).reshape((arr.shape[axis], -1)).T # create view
for i, o in zip(arr_r, out_r):
o[...] = i
return out
Changing your function to work with a list/tuple of axes:
In [378]: def test(arr, offsets, axis=0):
...: #out = np.zeros_like(arr)
...: with np.nditer([arr, None],flags=['external_loop'], #[arr, out]
...: op_flags=[['readonly'], ['writeonly', 'allocate']],
...: op_axes=[axis, None], #[[axis], [axis]]
...: ) as ndit:
...: for i, o in ndit:
...: print(i.shape, o.shape)
...: print(i)
...: o[...] = i
...: return ndit.operands[1]
...:
Now it iterates on the whole 2d array. With external_loop it passes a whole (flat) array.
In [379]: test(np.arange(12).reshape((3,4)),0,axis=[0,1])
(12,) (12,)
[ 0 1 2 3 4 5 6 7 8 9 10 11]
Out[379]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [380]: test(np.arange(12).reshape((3,4)),0,axis=[1,0])
(12,) (12,)
[ 0 1 2 3 4 5 6 7 8 9 10 11]
Out[380]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
test2
Adding the print to test2 to better see what's passed:
In [385]: def test2(arr, offsets, axis=0):
...: arr_r = np.rollaxis(arr, axis).reshape((arr.shape[axis], -1)).T
...: out = np.zeros_like(arr)
...: out_r = np.rollaxis(out, axis).reshape((arr.shape[axis], -1)).T # create view
...: for i, o in zip(arr_r, out_r):
...: print(i.shape, i)
...: o[...] = i
...: return out
...:
In [386]: test2(np.arange(12).reshape((3,4)),0,axis=0)
(3,) [0 4 8]
(3,) [1 5 9]
(3,) [ 2 6 10]
(3,) [ 3 7 11]
Out[386]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [387]: test2(np.arange(12).reshape((3,4)),0,axis=1)
(4,) [0 1 2 3]
(4,) [4 5 6 7]
(4,) [ 8 9 10 11]
Out[387]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
timings
Taking out the prints to do timings:
nditer:
In [391]: timeit test0(np.arange(12).reshape((3,4)),0,axis=(0,1))
11.6 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
iteration:
In [392]: timeit test20(np.arange(12).reshape((3,4)),0,axis=0)
26.5 µs ± 732 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
nditer, but without external_loop
In [395]: timeit test01(np.arange(12).reshape((3,4)),0,axis=(0,1))
17.9 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Often in time tests nditer performs slower. Here though the external_loop case only has to iterate once, passing the whole flattened array to the body.
big picture
So far we are just trying to iterate through a 2d array. In the intro you talk of using
A (shape K,L,M) and B (shape K,M).
Normally in numpy we try to avoid any iteration. If B is (K,1,M) (as with B[:,None,:], then we can do all kinds of things with them
C = A + B[:,None]
C = A * B[:,None]
without needing to iterate. Any python level iteration with arrays slows down the code.
For example, I have got an array like this:
([ 1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5 ])
I need to find all duplicated sequences , not values, but sequences of at least two values one by one.
The result should be like this:
of length 2: [1, 5] with indexes (0, 16);
of length 3: [3, 3, 7] with indexes (6, 12); [7, 9, 4] with indexes (2, 8)
The long sequences should be excluded, if they are not duplicated. ([5, 5, 5, 5]) should NOT be taken as [5, 5] on indexes (0, 1, 2)! It's not a duplicate sequence, it's one long sequence.
I can do it with pandas.apply function, but it calculates too slow, swifter did not help me.
And in real life I need to find all of them, with length from 10 up to 100 values one by one on database with 1500 columns with 700 000 values each. So i really do need a vectorized decision.
Is there a vectorized decision for finding all at once? Or at least for finding only 10-values sequences? Or only 4-values sequences? Anything, that will be fully vectorized?
One possible implementation (although not fully vectorized) that finds all sequences of size n that appear more than once is the following:
import numpy as np
def repeated_sequences(arr, n):
Na = arr.size
r_seq = np.arange(n)
n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]
unique_seqs = np.unique(n_seqs, axis=0)
comp = n_seqs == unique_seqs[:, None]
M = np.all(comp, axis=-1)
if M.any():
matches = np.array(
[np.convolve(M[i], np.ones((n), dtype=int)) for i in range(M.shape[0])]
)
repeated_inds = np.count_nonzero(matches, axis=-1) > n
repeated_matches = matches[repeated_inds]
idxs = np.argwhere(repeated_matches > 0)[::n]
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
)
else:
return [], []
return unique_seqs[repeated_inds], grouped_idxs
In theory, you could replace
matches = np.array(
[np.convolve(M[i], np.ones((n), dtype=int)) for i in range(M.shape[0])]
)
with
matches = scipy.signal.convolve(
M, np.ones((1, n), dtype=int), mode="full"
).astype(int)
which would make the whole thing "fully vectorized", but my tests showed that this was 3 to 4 times slower than the for-loop. So I'd stick with that. Or simply,
matches = np.apply_along_axis(np.convolve, -1, M, np.ones((n), dtype=int))
which does not have any significant speed-up, since it's basically a hidden loop (see this).
This is based off #Divakar's answer here that dealt with a very similar problem, in which the sequence to look for was provided. I simply made it so that it could follow this procedure for all possible sequences of size n, which are found inside the function with n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]; unique_seqs = np.unique(n_seqs, axis=0).
For example,
>>> a = np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5])
>>> repeated_seqs, inds = repeated_sequences(a, n)
>>> for i, seq in enumerate(repeated_seqs[:10]):
...: print(f"{seq} with indexes {inds[i]}")
...:
[3 3 7] with indexes [ 6 12]
[7 9 4] with indexes [2 8]
Disclaimer
The long sequences should be excluded, if they are not duplicated. ([5, 5, 5, 5]) should NOT be taken as [5, 5] on indexes (0, 1, 2)! It's not a duplicate sequence, it's one long sequence.
This is not directly taken into account and the sequence [5, 5] would appear more than once according to this algorithm. You could do something like this, based off #Paul's answer here, but it involves a loop:
import numpy as np
repeated_matches = np.array([[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
idxs = np.argwhere(repeated_matches > 0)
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
)
>>> print(grouped_idxs)
[array([ 6, 7, 8, 12, 13, 14], dtype=int64),
array([ 7, 8, 9, 10], dtype=int64)]
# If there are consecutive numbers in grouped_idxs, that means that there is a long
# sequence that should be excluded. So, you'd have to check for consecutive numbers
filtered_idxs = []
for idx in grouped_idxs:
if not all((idx[1:] - idx[:-1]) == 1):
filtered_idxs.append(idx)
>>> print(filtered_idxs)
[array([ 6, 7, 8, 12, 13, 14], dtype=int64)]
Some tests:
>>> n = 3
>>> a = np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5])
>>> %timeit repeated_sequences(a, n)
414 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> n = 4
>>> a = np.random.randint(0, 10, (10000,))
>>> %timeit repeated_sequences(a, n)
3.88 s ± 54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> result, _ = repeated_sequences(a, n)
>>> result.shape
(2637, 4)
This is not the most efficient implementation by far, but it works as a 2D approach. Plus, if there aren't any repeated sequences, it returns empty lists.
EDIT: Full implementation
I vectorized the routine I added in the Disclaimer section as a possible solution to the long sequence problem and ended up with the following:
import numpy as np
# Taken from:
# https://stackoverflow.com/questions/53051560/stacking-numpy-arrays-of-different-length-using-padding
def stack_padding(it):
def resize(row, size):
new = np.array(row)
new.resize(size)
return new
row_length = max(it, key=len).__len__()
mat = np.array([resize(row, row_length) for row in it])
return mat
def repeated_sequences(arr, n):
Na = arr.size
r_seq = np.arange(n)
n_seqs = arr[np.arange(Na - n + 1)[:, None] + r_seq]
unique_seqs = np.unique(n_seqs, axis=0)
comp = n_seqs == unique_seqs[:, None]
M = np.all(comp, axis=-1)
repeated_seqs = []
idxs_repeated_seqs = []
if M.any():
matches = np.apply_along_axis(np.convolve, -1, M, np.ones((n), dtype=int))
repeated_inds = np.count_nonzero(matches, axis=-1) > n
if repeated_inds.any():
repeated_matches = matches[repeated_inds]
idxs = np.argwhere(repeated_matches > 0)
grouped_idxs = np.split(
idxs[:, 1], np.unique(idxs[:, 0], return_index=True)[1][1:]
)
# Additional routine
# Pad this uneven array with zeros so that we can use it normally
grouped_idxs = np.array(grouped_idxs, dtype=object)
padded_idxs = stack_padding(grouped_idxs)
# Find the indices where there are padded zeros
pad_positions = padded_idxs == 0
# Perform the "consecutive-numbers check" (this will take one
# item off the original array, so we have to correct for its shape).
idxs_to_remove= np.pad(
(padded_idxs[:, 1:] - padded_idxs[:, :-1]) == 1,
[(0, 0), (0, 1)],
constant_values=True,
)
pad_positions = np.argwhere(pad_positions)
i = pad_positions[:, 0]
j = pad_positions[:, 1] - 1 # Shift by one (shape correction)
idxs_to_remove[i, j] = True # Masking, since we don't want pad indices
# Obtain a final mask (boolean opposite of indices to remove)
final_mask = ~idxs_to_remove.all(axis=-1)
grouped_idxs = grouped_idxs[final_mask] # Filter the long sequences
repeated_seqs = unique_seqs[repeated_inds][final_mask]
# In order to get the correct indices, we must first limit the
# search to a shape (on axis=1) of the closest multiple of n.
# This will avoid taking more indices than we should to show where
# each repeated sequence begins
to = padded_idxs.shape[1] & (-n)
# Build the final list of indices (that goes from 0 - to with
# a step of n
idxs_repeated_seqs = [
grouped_idxs[i][:to:n] for i in range(grouped_idxs.shape[0])
]
return repeated_seqs, idxs_repeated_seqs
For example,
n = 2
examples = [
# First example is your original example array.
np.array([1, 5, 7, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Second example has a long sequence of 5's, and since there aren't
# any [5, 5] anywhere else, it's not taken into account and therefore
# should not come out.
np.array([1, 5, 5, 5, 5, 6, 3, 3, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Third example has the same long sequence but since there is a [5, 5]
# later, then it should take it into account and this sequence should
# be found.
np.array([1, 5, 5, 5, 5, 6, 5, 5, 7, 9, 4, 0, 3, 3, 7, 8, 1, 5]),
# Fourth example has a [5, 5] first and later it has a long sequence of
# 5's which are uneven and the previous implementation got confused with
# the indices to show as the starting indices. In this case, it should be
# 1, 13 and 15 for [5, 5].
np.array([1, 5, 5, 9, 4, 6, 3, 3, 7, 9, 4, 0, 3, 5, 5, 5, 5, 5]),
]
for a in examples:
print(f"\nExample: {a}")
repeated_seqs, inds = repeated_sequences(a, n)
for i, seq in enumerate(repeated_seqs):
print(f"\t{seq} with indexes {inds[i]}")
Output (as expected):
Example: [1 5 7 9 4 6 3 3 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [0 16]
[3 3] with indexes [6 12]
[3 7] with indexes [7 13]
[7 9] with indexes [2 8]
[9 4] with indexes [3 9]
Example: [1 5 5 5 5 6 3 3 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [0 16]
[3 3] with indexes [6 12]
[3 7] with indexes [7 13]
Example: [1 5 5 5 5 6 5 5 7 9 4 0 3 3 7 8 1 5]
[1 5] with indexes [ 0 16]
[5 5] with indexes [1 3 6]
Example: [1 5 5 9 4 6 3 3 7 9 4 0 3 5 5 5 5 5]
[5 5] with indexes [ 1 13 15]
[9 4] with indexes [3 9]
You can test it out yourself with more examples and more cases. Keep in mind this is what I understood from your disclaimer. If you want to count the long sequences as one, even if multiple sequences are in there (for example, [5, 5] appears twice in [5, 5, 5, 5]), this won't work for you and you'd have to come up with something else.
x is N by M matrix.
y is 1 by L vector.
I want to return "outer product" between x and y, let's call it z.
z[n,m,l] = x[n,m] * y[l]
I could probably do this using einsum.
np.einsum("ij,k->ijk", x[:, :, k], y[:, k])
or reshape afterwards.
np.outer(x[:, :, k], y).reshape((x.shape[0],x.shape[1],y.shape[0]))
But I'm thinking of doing this in np.outer only or something seems simpler, memory efficient.
Is there a way?
It's one of those numpy "can't know unless you happen to know" bits: np.outer flattens multidimensional inputs while np.multiply.outer doesn't:
m,n,l = 3,4,5
x = np.arange(m*n).reshape(m,n)
y = np.arange(l)
np.multiply.outer(x,y).shape
# (3, 4, 5)
The code for outer is:
multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
As its docs says, it flattens (i.e. ravel). If the arrays are already 1d, that expression could be written as
a[:,None] * b[None,:]
a[:,None] * b # broadcasting auto adds the None to b
We could apply broadcasting rules to your (n,m)*(1,l):
In [2]: x = np.arange(12).reshape(3,4); y = np.array([[1,2]])
In [3]: x.shape, y.shape
Out[3]: ((3, 4), (1, 2))
You want a (n,m,l), which a (n,m,1) * (1,1,l) achieves. We need to add a trailing dimension to x. The extra leading 1 on y is automatic:
In [4]: z = x[...,None]*y
In [5]: z.shape
Out[5]: (3, 4, 2)
In [6]: z
Out[6]:
array([[[ 0, 0],
[ 1, 2],
[ 2, 4],
[ 3, 6]],
[[ 4, 8],
[ 5, 10],
[ 6, 12],
[ 7, 14]],
[[ 8, 16],
[ 9, 18],
[10, 20],
[11, 22]]])
Using einsum:
In [8]: np.einsum('nm,kl->nml', x, y).shape
Out[8]: (3, 4, 2)
The fact that you approved:
In [9]: np.multiply.outer(x,y).shape
Out[9]: (3, 4, 1, 2)
suggests y isn't really (1,l) but rather (l,)`. Adjust for either is easy.
I don't think there's much difference in memory efficiency among these. In this small example In[4] is fastest, but not by much.
Lets say I have a 2-d numpy array
a = np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]
and a 3-d numpy array like
b = np.array([[[1, 2, 8, 8],
[3, 4, 8, 8],
[8, 7, 0, 1],
[6, 5, 3, 2]],
[[1, 1, 1, 3],
[1, 1, 4, 2],
[0, 3, 2, 1],
[3, 2, 3, 9]]])
I want to calculate the statistics (mean, median, majority, sum, count,...) of b according to the "IDs" in a.
Example: sum should result in another array (or a list if that is easier), that gives the sum of the values in b. There are 4 unique "IDs" in a: 1,2,3,4, and 2 'layers' in b. For the 1's in a that is a sum of 10 (layer 0) and 4 (layer 1). For the 2's
it's 32 (layer 0) and 10 (layer 1), and so on...
Expected result for sum:
sums = [[1, 10, 4],
[2, 32, 10],
[3, 26, 8],
[4, 6, 15]]
Expected result for mean:
avgs = [[1, 2.5, 1.0 ],
[2, 8.0, 2.5 ],
[3, 6.5, 2.0 ],
[4, 1.5, 3.75]]
My guess, is that there is a handy function in numpy that does that already, but I am not sure what to search for exactly. Any pointers of how to do it, or what to search for, are much appreciated.
Update:
I came up with this for-loop, which is fine for very small arrays. However, my arrays are much larger than 4 by 4 and a faster impementation is needed.
result = []
ids = np.unique(a)
for id in ids:
line = [id]
for band in range(0, b.shape[0]):
cell = b[band][np.where(a == id)]
line.append(cell.mean())
# line.append(cell.min())
# line.append(cell.max())
# line.append(cell.std())
line.append(cell.sum())
line.append(np.median(cell))
result.append(line)
You can try the code below
cal_sums = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].sum()
for i in np.unique(a)] for j in range(2)]
cal_mean = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].mean()
for i in np.unique(a)] for j in range(2)]
sums = np.zeros((np.unique(a).size, b.shape[0]+1))
means = np.zeros((np.unique(a).size, b.shape[0]+1))
sums[:, 0] , sums[:,1:] = np.unique(a), np.asarray(cal_sums).T
means[:, 0] , means[:,1:] = np.unique(a), np.asarray(cal_mean).T
print(sums)
[[ 1. 10. 4.]
[ 2. 32. 10.]
[ 3. 26. 8.]
[ 4. 6. 15.]]
print(means)
[[1. 2.5 1. ]
[2. 8. 2.5 ]
[3. 6.5 2. ]
[4. 1.5 3.75]]
I tested it in quite large array size and it is fast
n = 1000
a = np.random.randint(1, 5, size=(n, n))
b = np.random.randint(1, 10, size=(2, n, n))
speed:
377 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I wish to apply Hungarian algorithm to many subsets of numpy matrix C indexed by cross products of lists row_ind, col_ind. Currently, I see the following options to do so:
Double slicing:
linear_sum_assignment(C[row_ind,:][:,col_ind])
Problem: two copies per subset operation.
Advanced slicing via np.ix_:
linear_sum_assignment(C[np.ix_(row_ind, col_ind)])
Problem: one copy per subset, np.ix_ is inefficient (allocates n x n matrix).
UPDATE: as noted by #hpaulj, np.ix_ doesn't it fact allocate n x n matrix, but it is somehow still slower than 1.
Masked array.
Problem: doesn't work with linear_sum_assignment.
So, no option is satisfying.
What is ideally desired is an ability to specify a submatrix view using the matrix C and a couple of unidimensional masks for rows and cols respectively, so such a view could be passed to linear_sum_assignment. For another linear_sum_assignment call, I would quickly adjust masks but never modify or copy/subset the full matrix.
Is there something similar already available in numpy?
What is the most efficient way (as little copies/memory allocations as possible) to process multiple submatrices of the same big matrix?
The different ways of indexing an array with a lists/arrays time about the same. They all produce copies, not views.
For example
In [99]: arr = np.ones((1000,1000),int)
In [100]: id1=np.arange(0,1000,10)
In [101]: id2=np.arange(0,1000,20)
In [105]: timeit arr[id1,:][:,id2].shape
52.5 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [106]: timeit arr[np.ix_(id1,id2)].shape
66.5 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In contrast if I use slices (in this case selecting the same elements), I get a view, which is much faster:
In [107]: timeit arr[::10,::20].shape
661 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
ix_ doesn't create a (m,n) array; it returns a tuple of adjusted 1d arrays. It's the equivalent of
In [108]: timeit arr[id1[:,None], id2].shape
54.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The timing difference is primarily due to an extra layer of function calls.
Your scipy link has a [source] link:
https://github.com/scipy/scipy/blob/v0.19.1/scipy/optimize/_hungarian.py#L13-L107
This optimize.linear_sum_assignment function creates a _Hungary object with the cost_matrix. That makes a copy, and solves the problem by searching and manipulating its values.
Using the documentation example:
In [110]: optimize.linear_sum_assignment(cost)
Out[110]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
What it does is create a state object:
In [111]: H=optimize._hungarian._Hungary(cost)
In [112]: vars(H)
Out[112]:
{'C': array([[4, 1, 3],
[2, 0, 5],
[3, 2, 2]]),
'Z0_c': 0,
'Z0_r': 0,
'col_uncovered': array([ True, True, True], dtype=bool),
'marked': array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]),
'path': array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
It iterates,
In [113]: step=optimize._hungarian._step1
In [114]: while step is not None:
...: step = step(H)
...:
And the resulting state is:
In [115]: vars(H)
Out[115]:
{'C': array([[1, 0, 1],
[0, 0, 4],
[0, 1, 0]]),
'Z0_c': 0,
'Z0_r': 1,
'col_uncovered': array([False, False, False], dtype=bool),
'marked': array([[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]),
'path': array([[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
The solution is pulled from the marked array
In [116]: np.where(H.marked)
Out[116]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
The total cost is the sum of these values:
In [122]: cost[np.where(H.marked)]
Out[122]: array([1, 2, 2])
But the cost from the C array in the final state is 0:
In [124]: H.C[np.where(H.marked)]
Out[124]: array([0, 0, 0])
So even if the submatrix that you give to optimize.linear_sum_assignment is a view, the search still involves a copy. The search space and time increases significantly with the size of this cost matrix.