Element-wise Euclidian distances between 2 numpy arrays - numpy

I have two Numpy arrays, each having n rows:
a = [[X1a, Y1a], [X2a, Y2a], .. , [Xna, Yna]]
b = [[X1b, Y1b], [X2b, Y2b], .. , [Xnb, Ynb]]
How can I get a new table with the Euclidean distance of each corresponding row?
c = [dis(1a, 1b), dis(2a, 2b), .. , dis(na, nb)]
or maybe
c = [[dis(1a, 1b)], [dis(2a, 2b)], .. , [dis(na, nb)]]

c = []
for i in range(a.shape[0]):
c.append(math.sqrt( (a[i][0]-b[i][0])**2 + (a[i][1] - b[i][1])**2))
This will work.
a.shape[0] will give the value of n
For inputs
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
You will get
c = [1.4142135623730951, 2.8284271247461903, 2.8284271247461903]

I find vectorizing is more pythonic and faster:
a = np.array(a)
b = np.array(b)
np.sqrt(np.sum((a-b)**2,axis=1))

There are loads of examples of using scipy's cdist, or pdist or just numpy's einsum to calculate distances. They scale to multiple dimensions as well.
from scipy.spatial.distance import cdist
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
cdist(a, b)
Out[14]:
array([[ 1.414, 0.000, 2.828],
[ 3.162, 2.828, 0.000],
[ 5.831, 5.657, 2.828]])
or
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
b = b[:, np.newaxis]
diff = a - b
np.sqrt(np.einsum('ijk,ijk->ij', diff, diff))
array([[ 1.414, 3.162, 5.831],
[ 0.000, 2.828, 5.657],
[ 2.828, 0.000, 2.828]])

The Euclidian distance is also known as the 2 norm. numpy.linalg.norm will calculate this efficiently across your vectors:
import numpy.linalg as la
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
c = la.norm(a - b, axis=1)

Related

Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch

I have a tensor, say
A = tensor([
[0, 0],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[1, 0],
[1, 1],
[1, 4],
[1, 5],
[1, 6]
])
and the other tensor
b = tensor([[0, 2], [1, 2]])
I would like to find an efficient way to index into A by b such that the result is
result = tensor([[0, 3], [1, 4]])
That is, match A’s first column of last dim (i.e. [0,…,1…]) with b’s first column of the last dim (i.e. [0,1]) by their values and then use b’s second column (i.e. [2, 2]) to index A’s second column.
Thanks
Work out a solution by converting it to one dimensional problem with torch.nonzero and offset by mask sum.
Instead of the original A, get a flatten version, like
A = tensor([[ 0], [ 2], [ 3], [ 4], [ 5], [ 7], [ 8], [11], [12]])
and also calculate the offsets along batch,
offset = tensor([[0], [5], [4]])
Similarly, get b
b = tensor([2, 2])
and
offset_b = b+offset.reshape(-1)[:-1]
Then
indices=A.reshape(-1)[offset_b]

How to index a 3D tensor with another 3D tensor?

For example, if I have an input tensor:
x = [[[0, 1], [2, -1]], [[5, 1], [-10, -100]]] batch x 2 x 2 dimensionality
and a indices tensor:
ind = [[[1], [0]], [[0], [0]]]
How to go about indexing x with ind in order to obtain:
out = [[[1], [2]], [[5], [-10]]]
You can use torch.Tensor.gather
x = torch.tensor([[[0, 1], [2, -1]], [[5, 1], [-10, -100]]])
ind = torch.tensor([[[1], [0]], [[0], [0]]])
out = x.gather(2, ind)
which results in
>>> out
tensor([[[ 1],
[ 2]],
[[ 5],
[-10]]])

Create 2d tensor of points using tensorflow

I'm trying to make a tensor with all the points between a certain range.
For example
min_x = 5
max_x = 7
min_y = 3
max_y = 5
points = get_points(min_x, max_x, min_y, max_y)
print(point) # [[5, 3], [5, 4], [5, 5], [6, 3], [6, 4], [6, 5], [7, 3], [7, 4], [7, 5]]
I'm trying to do this inside a tensorflow function. AKA #tf.function
Also all the inputs to get_points need to be tensors.
Thanks, I'm new to tensorflow as you can tell.
You can use tf.meshgrid, then stack x and y along the last dim after reshaping these two tensors.
min_x = 5
max_x = 7
min_y = 3
max_y = 5
def get_points(min_x, max_x, min_y, max_y):
x, y = tf.meshgrid(tf.range(min_x, max_x+1),tf.range(min_y, max_y+1))
_x = tf.reshape(x, (-1,1))
_y = tf.reshape(y, (-1,1))
return tf.squeeze(tf.stack([_x, _y], axis=-1))
res = get_points(min_x, max_x, min_y, max_y)
K.eval(res)
# array([[5, 3],
# [6, 3],
# [7, 3],
# [5, 4],
# [6, 4],
# [7, 4],
# [5, 5],
# [6, 5],
# [7, 5]], dtype=int32)

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

overwriting "|" operator to concatenate numpy arrays

I am wondering how to overload/overwrite the | operator to concatenate (two-dimensional) numpy arrays along the second axis, so that
u = np.array([[7], [8], [9]])
v = np.array([[1, 2], [3, 4], [5, 6]])
w = u | v
produces the same result as
u = np.array([[7], [8], [9]])
v = np.array([[1, 2], [3, 4], [5, 6]])
w = np.concatenate((u, v), axis=1)
i.e., results in
[[7, 1, 2],
[8, 3, 4],
[9, 5, 6]]
being assigned to w.
NB: The original meaning of | is elucidated in the first comment below.
PS: I am willing to edit the numpy source code.