Calculating distance between each element of an array - numpy

I have an array,
a = np.array([1, 3, 5, 10])
I would like to create a function that calculates the distance between each of its elements from every other element. There should be no for loop as speed is critical.
The expected result of the above would be:
array([[0, 2, 4, 9],
[2, 0, 2, 7],
[4, 2, 0, 5],
[9, 7, 5, 0]])

You can use numpy.subtract.outer:
np.abs(np.subtract.outer(a, a))
array([[0, 2, 4, 9],
[2, 0, 2, 7],
[4, 2, 0, 5],
[9, 7, 5, 0]])
Or equivalently use either of the followings:
np.abs(a - a[:, np.newaxis])
np.abs(a - a[:, None])
np.abs(a - a.reshape((-1, 1)))

Related

Indexing numpy array using another numpy array [duplicate]

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Create a view containing subsets of numpy array

I have an numpy array of shape (1000,100)
I would like to create a new array containing the first 100 rows and then all the rows between 200th and 299th (boundaries included). Is there a way to do it using only views, without copying all the data of the array?
Unfortunately, not.
Here is why: A NumPy array draws data from an underlying block of contiguous memory.
The dtype, shape, and strides of the array determine how the data in that block of memory is to be interpreted as values.
Since an array can have only one strides attribute, the values have to be regularly spaced. Therefore, an array can not be a view of another array which takes values from the original array at irregularly spaced intervals.
Note, however, that Divakar shows that by a clever reshaping to a 3D array, the desired values can be viewed as a slice with a regularly spaced stride. So if you are willing to add another dimension, it is possible to create a view with the desired values.
Building on Divakar's answer, you could also use a.reshape(10,-1,a.shape[1])[:3:2]. This breaks the array into 10 chunks, then slices off the first 3, and steps by 2 -- giving you only the first and third chunks.
You could have a 3D array of shape (2,100,100) with some slicing and reshaping, where the first element would be the first block (0-99) rows and the second element would represent the second block with values from 200 - 299 rows off the input array.
The implementation would be -
a[:300].reshape(3,-1,a.shape[1])[::2]
Sample run with input array of shape (20,5) as we would try to get rows (0-5) and (10-15) -
1) Input array :
In [364]: a
Out[364]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7],
[6, 3, 3, 3, 3],
[1, 6, 1, 3, 5],
[6, 8, 4, 7, 6],
[8, 4, 6, 8, 7],
[4, 8, 3, 5, 2],
[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5],
[2, 6, 8, 2, 4],
[5, 6, 2, 5, 0],
[6, 2, 4, 2, 7],
[3, 1, 6, 8, 4],
[0, 4, 3, 2, 0]])
2) Use proposed slicing and reshaping to get us a 3D array :
In [365]: a[:15].reshape(3,-1,a.shape[1])[::2]
Out[365]:
array([[[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7]],
[[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]]])
3) Verify output with manual slicing :
In [366]: a[:5]
Out[366]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7]])
In [367]: a[10:15]
Out[367]:
array([[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]])
4) Finally, the most important part to verify that it's a view indeed :
In [368]: np.shares_memory(a, a[:15].reshape(3,-1,a.shape[1])[::2])
Out[368]: True
5) We could of course reshape it afterwards to get a 2D output, but that forces a copy there -
In [371]: a[:15].reshape(3,-1,a.shape[1])[::2].reshape(-1,a.shape[1])
Out[371]:
array([[6, 2, 3, 4, 7],
[4, 7, 7, 4, 7],
[3, 5, 6, 2, 1],
[0, 6, 7, 4, 8],
[1, 5, 8, 6, 7],
[4, 6, 7, 0, 8],
[7, 1, 6, 0, 7],
[1, 5, 5, 4, 4],
[3, 4, 8, 4, 7],
[0, 4, 5, 0, 5]])
In [372]: np.shares_memory(a, _)
Out[372]: False

How to simplify a numpy array indexing? [duplicate]

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Python - numpy mgrid and reshape

Can someone explain to me what the second line of this code does?
objp = np.zeros((48,3), np.float32)
objp[:,:2] = np.mgrid[0:8,0:6].T.reshape(-1,2)
Can someone explain to me what exactly the np.mgrid[0:8,0:6] part of the code is doing and what exactly the T.reshape(-1,2) part of the code is doing?
Thanks and good job!
The easiest way to see these is to use smaller values for mgrid:
In [11]: np.mgrid[0:2,0:3]
Out[11]:
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
In [12]: np.mgrid[0:2,0:3].T # (matrix) transpose
Out[12]:
array([[[0, 0],
[1, 0]],
[[0, 1],
[1, 1]],
[[0, 2],
[1, 2]]])
In [13]: np.mgrid[0:2,0:3].T.reshape(-1, 2) # reshape to an Nx2 matrix
Out[13]:
array([[0, 0],
[1, 0],
[0, 1],
[1, 1],
[0, 2],
[1, 2]])
Then objp[:,:2] = sets the 0th and 1th columns of objp to this result.
The second line creates a multi-dimensional mesh grid, transposes it, reshapes it so that it represents two columns and inserts it into the first two columns of the objp array.
Breakdown:
np.mgrid[0:8,0:6] creates the following mgrid:
>> np.mgrid[0:8,0:6]
array([[[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
The .T transposes the matrix, and the .reshape(-1,2) then reshapes it into two a two-column array shape. These two columns are then the correct shape to replace two columns in the original array.

Extract blocks or patches from NumPy Array

I have a 2-d numpy array as follows:
a = np.array([[1,5,9,13],
[2,6,10,14],
[3,7,11,15],
[4,8,12,16]]
I want to extract it into patches of 2 by 2 sizes with out repeating the elements.
The answer should exactly be the same. This can be 3-d array or list with the same order of elements as below:
[[[1,5],
[2,6]],
[[3,7],
[4,8]],
[[9,13],
[10,14]],
[[11,15],
[12,16]]]
How can do it easily?
In my real problem the size of a is (36, 72). I can not do it one by one. I want programmatic way of doing it.
Using scikit-image:
import numpy as np
from skimage.util import view_as_blocks
a = np.array([[1,5,9,13],
[2,6,10,14],
[3,7,11,15],
[4,8,12,16]])
print(view_as_blocks(a, (2, 2)))
You can achieve it with a combination of np.reshape and np.swapaxes like so -
def extract_blocks(a, blocksize, keep_as_view=False):
M,N = a.shape
b0, b1 = blocksize
if keep_as_view==0:
return a.reshape(M//b0,b0,N//b1,b1).swapaxes(1,2).reshape(-1,b0,b1)
else:
return a.reshape(M//b0,b0,N//b1,b1).swapaxes(1,2)
As can be seen there are two ways to use it - With keep_as_view flag turned off (default one) or on. With keep_as_view = False, we are reshaping the swapped-axes to a final output of 3D, while with keep_as_view = True, we will keep it 4D and that will be a view into the input array and hence, virtually free on runtime. We will verify it with a sample case run later on.
Sample cases
Let's use a sample input array, like so -
In [94]: a
Out[94]:
array([[2, 2, 6, 1, 3, 6],
[1, 0, 1, 0, 0, 3],
[4, 0, 0, 4, 1, 7],
[3, 2, 4, 7, 2, 4],
[8, 0, 7, 3, 4, 6],
[1, 5, 6, 2, 1, 8]])
Now, let's use some block-sizes for testing. Let's use a blocksize of (2,3) with the view-flag turned off and on -
In [95]: extract_blocks(a, (2,3)) # Blocksize : (2,3)
Out[95]:
array([[[2, 2, 6],
[1, 0, 1]],
[[1, 3, 6],
[0, 0, 3]],
[[4, 0, 0],
[3, 2, 4]],
[[4, 1, 7],
[7, 2, 4]],
[[8, 0, 7],
[1, 5, 6]],
[[3, 4, 6],
[2, 1, 8]]])
In [48]: extract_blocks(a, (2,3), keep_as_view=True)
Out[48]:
array([[[[2, 2, 6],
[1, 0, 1]],
[[1, 3, 6],
[0, 0, 3]]],
[[[4, 0, 0],
[3, 2, 4]],
[[4, 1, 7],
[7, 2, 4]]],
[[[8, 0, 7],
[1, 5, 6]],
[[3, 4, 6],
[2, 1, 8]]]])
Verify view with keep_as_view=True
In [20]: np.shares_memory(a, extract_blocks(a, (2,3), keep_as_view=True))
Out[20]: True
Let's check out performance on a large array and verify the virtually free runtime claim as discussed earlier -
In [42]: a = np.random.rand(2000,3000)
In [43]: %timeit extract_blocks(a, (2,3), keep_as_view=True)
1000000 loops, best of 3: 801 ns per loop
In [44]: %timeit extract_blocks(a, (2,3), keep_as_view=False)
10 loops, best of 3: 29.1 ms per loop
Here's a rather cryptic numpy one-liner to generate your 3-d array, called result1 here:
In [60]: x
Out[60]:
array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
[3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
[2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
[0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
[0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
[1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
[2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
[0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])
In [61]: result1 = x.reshape(x.shape[0]//2, 2, x.shape[1]//2, 2).swapaxes(1, 2).reshape(-1, 2, 2)
result1 is like a 1-d array of 2-d arrays:
In [68]: result1.shape
Out[68]: (20, 2, 2)
In [69]: result1[0]
Out[69]:
array([[2, 1],
[3, 1]])
In [70]: result1[1]
Out[70]:
array([[2, 2],
[2, 1]])
In [71]: result1[5]
Out[71]:
array([[2, 0],
[0, 1]])
In [72]: result1[-1]
Out[72]:
array([[1, 2],
[2, 1]])
(Sorry, I don't have time at the moment to give a detailed breakdown of how it works. Maybe later...)
Here's a less cryptic version that uses a nested list comprehension. In this case, result2 is a python list of 2-d numpy arrays:
In [73]: result2 = [x[2*j:2*j+2, 2*k:2*k+2] for j in range(x.shape[0]//2) for k in range(x.shape[1]//2)]
In [74]: result2[5]
Out[74]:
array([[2, 0],
[0, 1]])
In [75]: result2[-1]
Out[75]:
array([[1, 2],
[2, 1]])