I wish to apply Hungarian algorithm to many subsets of numpy matrix C indexed by cross products of lists row_ind, col_ind. Currently, I see the following options to do so:
Double slicing:
linear_sum_assignment(C[row_ind,:][:,col_ind])
Problem: two copies per subset operation.
Advanced slicing via np.ix_:
linear_sum_assignment(C[np.ix_(row_ind, col_ind)])
Problem: one copy per subset, np.ix_ is inefficient (allocates n x n matrix).
UPDATE: as noted by #hpaulj, np.ix_ doesn't it fact allocate n x n matrix, but it is somehow still slower than 1.
Masked array.
Problem: doesn't work with linear_sum_assignment.
So, no option is satisfying.
What is ideally desired is an ability to specify a submatrix view using the matrix C and a couple of unidimensional masks for rows and cols respectively, so such a view could be passed to linear_sum_assignment. For another linear_sum_assignment call, I would quickly adjust masks but never modify or copy/subset the full matrix.
Is there something similar already available in numpy?
What is the most efficient way (as little copies/memory allocations as possible) to process multiple submatrices of the same big matrix?
The different ways of indexing an array with a lists/arrays time about the same. They all produce copies, not views.
For example
In [99]: arr = np.ones((1000,1000),int)
In [100]: id1=np.arange(0,1000,10)
In [101]: id2=np.arange(0,1000,20)
In [105]: timeit arr[id1,:][:,id2].shape
52.5 µs ± 243 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [106]: timeit arr[np.ix_(id1,id2)].shape
66.5 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In contrast if I use slices (in this case selecting the same elements), I get a view, which is much faster:
In [107]: timeit arr[::10,::20].shape
661 ns ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
ix_ doesn't create a (m,n) array; it returns a tuple of adjusted 1d arrays. It's the equivalent of
In [108]: timeit arr[id1[:,None], id2].shape
54.5 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The timing difference is primarily due to an extra layer of function calls.
Your scipy link has a [source] link:
https://github.com/scipy/scipy/blob/v0.19.1/scipy/optimize/_hungarian.py#L13-L107
This optimize.linear_sum_assignment function creates a _Hungary object with the cost_matrix. That makes a copy, and solves the problem by searching and manipulating its values.
Using the documentation example:
In [110]: optimize.linear_sum_assignment(cost)
Out[110]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
What it does is create a state object:
In [111]: H=optimize._hungarian._Hungary(cost)
In [112]: vars(H)
Out[112]:
{'C': array([[4, 1, 3],
[2, 0, 5],
[3, 2, 2]]),
'Z0_c': 0,
'Z0_r': 0,
'col_uncovered': array([ True, True, True], dtype=bool),
'marked': array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]),
'path': array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
It iterates,
In [113]: step=optimize._hungarian._step1
In [114]: while step is not None:
...: step = step(H)
...:
And the resulting state is:
In [115]: vars(H)
Out[115]:
{'C': array([[1, 0, 1],
[0, 0, 4],
[0, 1, 0]]),
'Z0_c': 0,
'Z0_r': 1,
'col_uncovered': array([False, False, False], dtype=bool),
'marked': array([[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]),
'path': array([[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]]),
'row_uncovered': array([ True, True, True], dtype=bool)}
The solution is pulled from the marked array
In [116]: np.where(H.marked)
Out[116]: (array([0, 1, 2], dtype=int32), array([1, 0, 2], dtype=int32))
The total cost is the sum of these values:
In [122]: cost[np.where(H.marked)]
Out[122]: array([1, 2, 2])
But the cost from the C array in the final state is 0:
In [124]: H.C[np.where(H.marked)]
Out[124]: array([0, 0, 0])
So even if the submatrix that you give to optimize.linear_sum_assignment is a view, the search still involves a copy. The search space and time increases significantly with the size of this cost matrix.
Related
Given I have two arrays, say A (shape K,L,M) and B (shape K,M).
I want to iterate vectorwise and construct an output C (shape equal to A) by running a function f on each input vector a and scalar b and then reassembling it into the output (i.e. for each c = f(a, b) (where a = A[i, :, j], b = B[i, j], c as a)). In this example the vector axis would be a.shape[1], but in general it could be any other also.
After reading the documentation page of nditer, I thought it should be appropriate and elegant to use here, since apparently it can allocate everything for you, allows a separate external loop, and easily allows reassembly of the output.
However, I am unable to even make something as simple as a vector-wise copy (again along axis) of an existing array using nditer work properly. Is what I want to do simply not possible with nditer or am I using it wrong?
def test(arr, offsets, axis=0):
#out = np.zeros_like(arr)
with np.nditer([arr, None], flags=['external_loop'], #[arr, out]
op_flags=[['readonly'], ['writeonly', 'allocate']],
op_axes=[[axis], None], #[[axis], [axis]]
) as ndit:
for i, o in ndit:
print(i.shape, o.shape)
o[...] = i
return ndit.operands[1]
tested = test(xam.data, shifts, axis=1)
print('test output shape', tested2.shape)
>>> (<L>,) (<L>,)
>>> test output shape (<L>,)
This gives an output only of the very first input. Even if I explicitly give an output that has the same shape as input (e.g. via the commented out changes), then the nditer only runs once on the very first length L vector.
>>> (<L>,) (<L>,)
>>> test output shape (<N>, <L>, <M>)
I have made an alternative version using rollaxis views, but it is not particularly pretty or intuitive, so I was wondering if it should not also be possible with nditer, somehow...
def test2(arr, offsets, axis=0):
arr_r = np.rollaxis(arr, axis).reshape((arr.shape[axis], -1)).T
out = np.zeros_like(arr)
out_r = np.rollaxis(out, axis).reshape((arr.shape[axis], -1)).T # create view
for i, o in zip(arr_r, out_r):
o[...] = i
return out
Changing your function to work with a list/tuple of axes:
In [378]: def test(arr, offsets, axis=0):
...: #out = np.zeros_like(arr)
...: with np.nditer([arr, None],flags=['external_loop'], #[arr, out]
...: op_flags=[['readonly'], ['writeonly', 'allocate']],
...: op_axes=[axis, None], #[[axis], [axis]]
...: ) as ndit:
...: for i, o in ndit:
...: print(i.shape, o.shape)
...: print(i)
...: o[...] = i
...: return ndit.operands[1]
...:
Now it iterates on the whole 2d array. With external_loop it passes a whole (flat) array.
In [379]: test(np.arange(12).reshape((3,4)),0,axis=[0,1])
(12,) (12,)
[ 0 1 2 3 4 5 6 7 8 9 10 11]
Out[379]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [380]: test(np.arange(12).reshape((3,4)),0,axis=[1,0])
(12,) (12,)
[ 0 1 2 3 4 5 6 7 8 9 10 11]
Out[380]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
test2
Adding the print to test2 to better see what's passed:
In [385]: def test2(arr, offsets, axis=0):
...: arr_r = np.rollaxis(arr, axis).reshape((arr.shape[axis], -1)).T
...: out = np.zeros_like(arr)
...: out_r = np.rollaxis(out, axis).reshape((arr.shape[axis], -1)).T # create view
...: for i, o in zip(arr_r, out_r):
...: print(i.shape, i)
...: o[...] = i
...: return out
...:
In [386]: test2(np.arange(12).reshape((3,4)),0,axis=0)
(3,) [0 4 8]
(3,) [1 5 9]
(3,) [ 2 6 10]
(3,) [ 3 7 11]
Out[386]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [387]: test2(np.arange(12).reshape((3,4)),0,axis=1)
(4,) [0 1 2 3]
(4,) [4 5 6 7]
(4,) [ 8 9 10 11]
Out[387]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
timings
Taking out the prints to do timings:
nditer:
In [391]: timeit test0(np.arange(12).reshape((3,4)),0,axis=(0,1))
11.6 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
iteration:
In [392]: timeit test20(np.arange(12).reshape((3,4)),0,axis=0)
26.5 µs ± 732 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
nditer, but without external_loop
In [395]: timeit test01(np.arange(12).reshape((3,4)),0,axis=(0,1))
17.9 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Often in time tests nditer performs slower. Here though the external_loop case only has to iterate once, passing the whole flattened array to the body.
big picture
So far we are just trying to iterate through a 2d array. In the intro you talk of using
A (shape K,L,M) and B (shape K,M).
Normally in numpy we try to avoid any iteration. If B is (K,1,M) (as with B[:,None,:], then we can do all kinds of things with them
C = A + B[:,None]
C = A * B[:,None]
without needing to iterate. Any python level iteration with arrays slows down the code.
Numpy apply_along_axis/apply_over_axes assume that the applied function returns a scalar, but what if I want to use a function that returns an array (thus adding new dimensions)?
Below is a simplified example. I want to apply my_func to each row of an array. I could do this in pandas but expect numpy to be faster.
Function:
def my_func(k):
x = np.arange(3)
y = x ** k
return y
Original array:
array([[1],
[2],
[3]])
Expected result:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]], dtype=int32)
Update: it was an oversimplified example. I should have said the real function can only take a scalar as input. But the solution proposed by Michael Szczesny in comments works for such functions too.
Update2: I should have said a function that does not broadcast, like this:
def my_func(k):
return np.random.randint(1, 4, 5) + k
I am sharing the code for your reference,
import numpy as np
def my_func(k):
x = np.arange(4)
y = x ** k
return y
inp = np.array([[1],[2],[3]])
print(my_func(inp))
Output:
[[ 0 1 2 3]
[ 0 1 4 9]
[ 0 1 8 27]]
See if it helps?
Your function, with an added print to see exactly what k is:
In [39]: def my_func(k):
...: print(k)
...: x = np.arange(4) # range to match your expected result
...: y = x ** k
...: return y
...:
As written the function works with your (3,1) array, arr = np.arange(1,4)[:,None]:
In [40]: my_func(arr)
[[1]
[2]
[3]]
Out[40]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Note the whole 2d array is passed. The x**k step works by broadcasting, using a (4,) array with a (3,1), to produce a (3,4) result. You should, if possible write functions that work like this, taking full advantage of the numpy methods and operators.
apply... can be used as here:
In [41]: np.apply_along_axis(my_func, 1, arr)
[1]
[2]
[3]
Out[41]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Note that it passes (1,) arrays to the function. The docs should make it clear that this is designed to pass a 1d array to the function, NOT a scalar.
The equivalent for a 2d arr array is:
In [42]: np.array([my_func(i) for i in arr])
[1]
[2]
[3]
Out[42]:
array([[ 0, 1, 2, 3],
[ 0, 1, 4, 9],
[ 0, 1, 8, 27]])
Now lets comment out the print and do some time tests:
In [44]: timeit my_func(arr)
7.41 µs ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [45]: timeit np.apply_along_axis(my_func, 1, arr)
89.2 µs ± 649 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [46]: timeit np.array([my_func(i) for i in arr])
28.9 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The broadcasted approach is fastest. apply_along_axis is slowest.
I claim that apply_along_axis is only useful when the array dimensions are greater than 2, and even then it just makes the code prettier, not faster.
For example with a 3d array, that still broadcasts with the (4,) shape x:
In [47]: arr = np.arange(24).reshape(2,3,4)
In [49]: np.apply_along_axis(my_func, 2, arr).shape
Out[49]: (2, 3, 4)
In [50]: my_func(arr).shape
Out[50]: (2, 3, 4)
In [51]: np.array([[my_func(arr[i,j,:]) for j in range(3)] for i in range(2)]).shape
Out[51]: (2, 3, 4)
The list iteration requires a double loop. apply_along_axis hides this, but does not reduce the total number of calls to my_func.
If your function really required a scalar (e.g. use a math.cos or if test), then you might consider np.vectorize. For smallist examples it's slower than the equivalent list comprehension, but it does scale better for large ones. But again, if you can write the function to work directly with array, you'll much happier with the performance.
Lets say I have a 2-d numpy array
a = np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]
and a 3-d numpy array like
b = np.array([[[1, 2, 8, 8],
[3, 4, 8, 8],
[8, 7, 0, 1],
[6, 5, 3, 2]],
[[1, 1, 1, 3],
[1, 1, 4, 2],
[0, 3, 2, 1],
[3, 2, 3, 9]]])
I want to calculate the statistics (mean, median, majority, sum, count,...) of b according to the "IDs" in a.
Example: sum should result in another array (or a list if that is easier), that gives the sum of the values in b. There are 4 unique "IDs" in a: 1,2,3,4, and 2 'layers' in b. For the 1's in a that is a sum of 10 (layer 0) and 4 (layer 1). For the 2's
it's 32 (layer 0) and 10 (layer 1), and so on...
Expected result for sum:
sums = [[1, 10, 4],
[2, 32, 10],
[3, 26, 8],
[4, 6, 15]]
Expected result for mean:
avgs = [[1, 2.5, 1.0 ],
[2, 8.0, 2.5 ],
[3, 6.5, 2.0 ],
[4, 1.5, 3.75]]
My guess, is that there is a handy function in numpy that does that already, but I am not sure what to search for exactly. Any pointers of how to do it, or what to search for, are much appreciated.
Update:
I came up with this for-loop, which is fine for very small arrays. However, my arrays are much larger than 4 by 4 and a faster impementation is needed.
result = []
ids = np.unique(a)
for id in ids:
line = [id]
for band in range(0, b.shape[0]):
cell = b[band][np.where(a == id)]
line.append(cell.mean())
# line.append(cell.min())
# line.append(cell.max())
# line.append(cell.std())
line.append(cell.sum())
line.append(np.median(cell))
result.append(line)
You can try the code below
cal_sums = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].sum()
for i in np.unique(a)] for j in range(2)]
cal_mean = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].mean()
for i in np.unique(a)] for j in range(2)]
sums = np.zeros((np.unique(a).size, b.shape[0]+1))
means = np.zeros((np.unique(a).size, b.shape[0]+1))
sums[:, 0] , sums[:,1:] = np.unique(a), np.asarray(cal_sums).T
means[:, 0] , means[:,1:] = np.unique(a), np.asarray(cal_mean).T
print(sums)
[[ 1. 10. 4.]
[ 2. 32. 10.]
[ 3. 26. 8.]
[ 4. 6. 15.]]
print(means)
[[1. 2.5 1. ]
[2. 8. 2.5 ]
[3. 6.5 2. ]
[4. 1.5 3.75]]
I tested it in quite large array size and it is fast
n = 1000
a = np.random.randint(1, 5, size=(n, n))
b = np.random.randint(1, 10, size=(2, n, n))
speed:
377 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have 3d numpy array of the following shape:
(3600L, 7200L, 3L)
If any element in any dimension is 0, how can I convert the elements in the same position in other two dimensions into 0?
If an element is 0, it is 0 in each of the dimensions. I'll illustrate with a small 2d array:
In [1240]: M=np.arange(9).reshape(3,3)
In [1241]: M
Out[1241]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [1242]: M[0,0]
Out[1242]: 0
One element is 0, the 0 row and the 0 column. I can set the rest of those 2 dimensions to 0 with:
In [1243]: M[0,:]=0
In [1244]: M[:,0]=0
In [1245]: M
Out[1245]:
array([[0, 0, 0],
[0, 4, 5],
[0, 7, 8]])
You can generalize this to 3d and larger arrays. As long as you know the coordinates of that element in all dimensions. With a 3d array
M[i,:,:]=0
actually sets all the values in a plane (2d) to 0. Similarly for M[:,j,:] and M[:,:,k].
np.where gives the coordinates that match some condition:
In [1248]: I=np.where(M==0)
In [1249]: M[I[0],:]=0
In [1250]: M[:,I[1]]=0
In [1251]: M
Out[1251]:
array([[0, 0, 0],
[0, 4, 5],
[0, 7, 8]])
In [1252]:
In [1252]: I
Out[1252]: (array([0], dtype=int32), array([0], dtype=int32))
This works regardless of whether the match is for 1 element, 0, or more. Here it's just one.
You have an original sparse matrix X:
>>print type(X)
>>print X.todense()
<class 'scipy.sparse.csr.csr_matrix'>
[[1,4,3]
[3,4,1]
[2,1,1]
[3,6,3]]
You have a second sparse matrix Z, which is derived from some rows of X (say the values are doubled so we can see the difference between the two matrices). In pseudo-code:
>>Z = X[[0,2,3]]
>>print Z.todense()
[[1,4,3]
[2,1,1]
[3,6,3]]
>>Z = Z*2
>>print Z.todense()
[[2, 8, 6]
[4, 2, 2]
[6, 12,6]]
What's the best way of retrieving the rows in Z using the ORIGINAL indices from X. So for instance, in pseudo-code:
>>print Z[[0,3]]
[[2,8,6] #0 from Z, and what would be row **0** from X)
[6,12,6]] #2 from Z, but what would be row **3** from X)
That is, how can you retrieve rows from Z, using indices that refer to the original rows position in the original matrix X? To do this, you can't modify X in anyway (you can't add an index column to the matrix X), but there are no other limits.
If you have the original indices in an array i, and the values in i are in increasing order (as in your example), you can use numpy.searchsorted(i, [0, 3]) to find the indices in Z that correspond to indices [0, 3] in the original X. Here's a demonstration in an IPython session:
In [39]: X = csr_matrix([[1,4,3],[3,4,1],[2,1,1],[3,6,3]])
In [40]: X.todense()
Out[40]:
matrix([[1, 4, 3],
[3, 4, 1],
[2, 1, 1],
[3, 6, 3]])
In [41]: i = array([0, 2, 3])
In [42]: Z = 2 * X[i]
In [43]: Z.todense()
Out[43]:
matrix([[ 2, 8, 6],
[ 4, 2, 2],
[ 6, 12, 6]])
In [44]: Zsub = Z[searchsorted(i, [0, 3])]
In [45]: Zsub.todense()
Out[45]:
matrix([[ 2, 8, 6],
[ 6, 12, 6]])