Related
I have a tensor, say
A = tensor([
[0, 0],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[1, 0],
[1, 1],
[1, 4],
[1, 5],
[1, 6]
])
and the other tensor
b = tensor([[0, 2], [1, 2]])
I would like to find an efficient way to index into A by b such that the result is
result = tensor([[0, 3], [1, 4]])
That is, match A’s first column of last dim (i.e. [0,…,1…]) with b’s first column of the last dim (i.e. [0,1]) by their values and then use b’s second column (i.e. [2, 2]) to index A’s second column.
Thanks
Work out a solution by converting it to one dimensional problem with torch.nonzero and offset by mask sum.
Instead of the original A, get a flatten version, like
A = tensor([[ 0], [ 2], [ 3], [ 4], [ 5], [ 7], [ 8], [11], [12]])
and also calculate the offsets along batch,
offset = tensor([[0], [5], [4]])
Similarly, get b
b = tensor([2, 2])
and
offset_b = b+offset.reshape(-1)[:-1]
Then
indices=A.reshape(-1)[offset_b]
I have two numpy matrices (6 rows and 3 columns) :
a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])
b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])
a = array([[1, 2, 4],
[3, 6, 2],
[3, 4, 7],
[9, 7, 7],
[6, 3, 1],
[3, 5, 9]])
b = array([[4, 5, 2],
[9, 2, 5],
[1, 5, 6],
[4, 5, 6],
[1, 2, 6],
[6, 4, 3]])
I would like to calculate the pearson correlation coefficient between the first column of a and b, the second column of a and b and the third column of a and b.
The result would be a vector of 3 (3 correlation coeff).
One way using numpy.corrcoef and diagonal:
corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])
corr
Output:
array([-0.2324843 , -0.03631365, -0.18057878])
I have two Numpy arrays, each having n rows:
a = [[X1a, Y1a], [X2a, Y2a], .. , [Xna, Yna]]
b = [[X1b, Y1b], [X2b, Y2b], .. , [Xnb, Ynb]]
How can I get a new table with the Euclidean distance of each corresponding row?
c = [dis(1a, 1b), dis(2a, 2b), .. , dis(na, nb)]
or maybe
c = [[dis(1a, 1b)], [dis(2a, 2b)], .. , [dis(na, nb)]]
c = []
for i in range(a.shape[0]):
c.append(math.sqrt( (a[i][0]-b[i][0])**2 + (a[i][1] - b[i][1])**2))
This will work.
a.shape[0] will give the value of n
For inputs
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
You will get
c = [1.4142135623730951, 2.8284271247461903, 2.8284271247461903]
I find vectorizing is more pythonic and faster:
a = np.array(a)
b = np.array(b)
np.sqrt(np.sum((a-b)**2,axis=1))
There are loads of examples of using scipy's cdist, or pdist or just numpy's einsum to calculate distances. They scale to multiple dimensions as well.
from scipy.spatial.distance import cdist
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
cdist(a, b)
Out[14]:
array([[ 1.414, 0.000, 2.828],
[ 3.162, 2.828, 0.000],
[ 5.831, 5.657, 2.828]])
or
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
b = b[:, np.newaxis]
diff = a - b
np.sqrt(np.einsum('ijk,ijk->ij', diff, diff))
array([[ 1.414, 3.162, 5.831],
[ 0.000, 2.828, 5.657],
[ 2.828, 0.000, 2.828]])
The Euclidian distance is also known as the 2 norm. numpy.linalg.norm will calculate this efficiently across your vectors:
import numpy.linalg as la
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
c = la.norm(a - b, axis=1)
I'm new in numpy, I understand the methods of "Joining arrays" in lower shape such as (n1, n2) beacause we can visualize, like a matrix.
But I don't undestand the logic in higher dimensions (n0, ...., n_{d-1}) of course I can't visualize that. To visualize I usually imagine a multidimensional array like a tree, so (n0, ...., n_{d-1}) means that at level (axis) i of tree every node has n_{i} children. So at level 0 (the root) we have n0 children and so on.
In substance what is the formal exact definiton of "Joining arrays" algorithms?
https://numpy.org/doc/stable/reference/routines.array-manipulation.html
Let's see I can illustrate some basic array operations.
First make a 2d array. Start with a 1d, [0,1,...5], and reshape it to (2,3):
In [1]: x = np.arange(6).reshape(2,3)
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
I can join 2 copies of x along the 1st dimension (vstack, v for vertical also does this):
In [3]: np.concatenate([x,x], axis=0)
Out[3]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
Note that the result is (4,3); no new dimension.
Or join them 'horizontally':
In [4]: np.concatenate([x,x], axis=1)
Out[4]:
array([[0, 1, 2, 0, 1, 2], # (2,6) shape
[3, 4, 5, 3, 4, 5]])
But if I supply them to np.array I make a 3d array (2,2,3) shape:
In [5]: np.array([x,x])
Out[5]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
This action of np.array is really no different from making a 2d array from nested lists, np.array([[1,2],[3,4]]). We could just add a layer of nesting, just like Out[5} without the line breaks. I tend to think of this 3d array as having 2 blocks, each with 2 rows and 3 columns. But the names are just a convenience.
stack acts like np.array, making a 3d array. It actually changes the input arrays to (1,2,3) shape, and concatenates on the first axis.
In [6]: np.stack([x,x])
Out[6]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
stack lets us join the array in other ways
In [7]: np.stack([x,x], axis=1) # expand to (2,1,3) and concatante
Out[7]:
array([[[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5]]])
In [8]: np.stack([x,x], axis=2) # expand to (2,3,1) and concatenate
Out[8]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
concatenate and the other stack functions don't add anything new to basic numpy arrays. They just provide a way(s) of making a new array from existing ones. There aren't any special algorithms.
If it helps you could think of these join functions as creating a new "blank" array, and filling it with copies of the source arrays. For example that last stack can be done with:
In [9]: res = np.zeros((2,3,2), int)
In [10]: res
Out[10]:
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]]])
In [11]: res[:,:,0] = x
In [12]: res[:,:,1] = x
In [13]: res
Out[13]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
I have a numpy array A as follows:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and another numpy array column_indices_to_be_deleted as follows:
array([1, 0, 2])
I want to delete the element from every row of A specified by the column indices in column_indices_to_be_deleted. So, column index 1 from row 0, column index 0 from row 1 and column index 2 from row 2 in this case, to get a new array that looks like this:
array([[1, 3],
[5, 6],
[7, 8]])
What would be the simplest way of doing that?
One way with masking created with broadcatsed-comparison -
In [43]: a # input array
Out[43]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [44]: remove_idx # indices to be removed from each row
Out[44]: array([1, 0, 2])
In [45]: n = a.shape[1]
In [46]: a[remove_idx[:,None]!=np.arange(n)].reshape(-1,n-1)
Out[46]:
array([[1, 3],
[5, 6],
[7, 8]])
Another mask based approach with the mask created with array-assignment -
In [47]: mask = np.ones(a.shape,dtype=bool)
In [48]: mask[np.arange(len(remove_idx)), remove_idx] = 0
In [49]: a[mask].reshape(-1,a.shape[1]-1)
Out[49]:
array([[1, 3],
[5, 6],
[7, 8]])
Another with np.delete -
In [64]: m,n = a.shape
In [66]: np.delete(a.flat,remove_idx+n*np.arange(m)).reshape(m,-1)
Out[66]:
array([[1, 3],
[5, 6],
[7, 8]])