Fastest way to compare neighboring elements in multi-dimensional numpy array - numpy

What is the fastest way to compare neighboring elements in a 3-dimensional array?
Assume I have a numpy array of (4,4,4). I want to loop in the k-direction and compare elements in pairs. So, compare all neighboring elements and assign the lowest index if they are not equal. Essentially this:
if array([0, 0, 0)] != array[(0, 0, 1)]:
array[(0, 0, 0)] = 111
Thus, the comparisons would be:
(0, 0, 0) and (0, 0, 1)
(0, 0, 1) and (0, 0, 2)
(0, 0, 2) and (0, 0, 3)
(0, 0, 3) and (0, 0, 4)
... for all i and j ...
However, I want to do this for every i and j in the array and writing a standard Python for loop for this on huge arrays with millions of cells is incredibly slow. Is there a more 'standard' numpy way to do this without the explicit for loop?
Maybe there's some trick using the slicing step (i.e array[:,:,::2], array[:,:,1::2])?

Try np.diff.
import numpy as np
a = np.arange(9).reshape(3, 3)
A = np.array([a, a, a + 1]).T
same_with_neighbor_on_last_axis = np.diff(A, axis=-1) == 0
print A
print same_with_neighbor_on_last_axis
A is constructed to have 2 consecutive equal entries along the third axis,
>>>print A
array([[[0, 0, 1],
[3, 3, 4],
[6, 6, 7]],
[[1, 1, 2],
[4, 4, 5],
[7, 7, 8]],
[[2, 2, 3],
[5, 5, 6],
[8, 8, 9]]])
The output vector then yields
>>>print same_with_neighbor_on_last_axis
[[[ True False]
[ True False]
[ True False]]
[[ True False]
[ True False]
[ True False]]
[[ True False]
[ True False]
[ True False]]]
Using the axis keyword, you can choose whichever axis you need to do this operation on. If it is all of them, you can use a loop. np.diff does not much else than the following
np.diff(A, axis=-1) == A[..., 1:] - A[..., :-1]

Related

Given two arrays, `a` and `b`, how to find efficiently all combinations of elements in `b` that have equal value in `a`?

Given two arrays, a and b, how to find efficiently all combinations of elements in b that have equal value in a?
here is an example:
Given
a = [0, 0, 0, 1, 1, 2, 2, 2, 2]
b = [1, 2, 4, 5, 9, 3, 7, 22, 10]
how would you calculate
c = [[1, 2],
[1, 4],
[2, 4],
[5, 9],
[3, 7],
[3, 22],
[3, 10],
[7, 22],
[7, 10],
[22, 10]]
?
a can be assumed to be sorted.
I can do this with loops, a la:
import torch
a = torch.tensor([0, 0, 0, 1, 1, 2, 2, 2, 2])
b = torch.tensor([1, 2, 4, 5, 9, 3, 7, 22, 10])
jumps = torch.cat((torch.tensor([0]),
torch.where(a.diff() > 0)[0] + 1,
torch.tensor([len(a)])))
cs = []
for i in range(len(jumps) - 1):
cs.append(torch.combinations(b[jumps[i]:jumps[i + 1]]))
c = torch.cat(cs)
Is there any efficient way to avoid the loop? The solution should work for CPU and CUDA.
Also, the solution should have runtime O(m * m), where m is the largest number of equal elements in a and not O(n * n) where n is the length of of a.
I prefer solutions for pytorch, but I am curious for solution for numpy as well.
I think the overhead of using torch is only justified for bigger datasets, as there is basically no computational difficulty in the function, imho you can achieve same results with:
from collections import Counter
def find_combinations1(a, b):
count_a = Counter(a)
combinations = []
for x in set(b):
if count_a[x] == b.count(x):
combinations.append(x)
return combinations
or even a simpler:
def find_combinations2(a, b):
return list(set(a) & set(b))
With pytorch I assume the most simple approach is:
import torch
def find_combinations3(a, b):
a = torch.tensor(a)
b = torch.tensor(b)
eq = torch.eq(a, b.view(-1, 1))
indices = torch.nonzero(eq)
return indices[:, 1]
This option has of course a time complexity of O(n*m) where n is the size of a and m is the size of b, and O(n+m) is the memory for the tensors.

how to get row indices where row slice contains a single value (0)

With the numpy array
arr = np.array([[1, 1, 0, 0, 0, 1], [1, 1, 0, 0, 1, 1], [1, 1, 0, 0, 0, 1]])
I would like to get the indices of all rows where the row slice 2:5 contains all zeros.
In the above example, it should return rows 0 and 2.
I tried:
zero_indices = np.where(not np.any(arr[:,2:5]))
but it doesn't seem to work.
I'm trying to do this over a large array with several million rows.
Try this
np.nonzero((~arr[:,2:5].astype(bool)).all(1))[0]
Out[133]: array([0, 2], dtype=int32)
Or
np.nonzero((arr[:,2:5] == 0).all(1))[0]
Out[139]: array([0, 2], dtype=int32)

Reduce a dimension of numpy array by selecting

I have a 3d array
A = np.random.random((4,4,3))
and a index matrix
B = np.int_(np.random.random((4,4))*3)
How do I get a 2D array from A based on index matrix B?
In general, how to get a N-1 dimensional array from a ND array and a N-1 dimensional index array?
Lets take an example:
>>> A = np.random.randint(0,10,(3,3,2))
>>> A
array([[[0, 1],
[8, 2],
[6, 4]],
[[1, 0],
[6, 9],
[7, 7]],
[[1, 2],
[2, 2],
[9, 7]]])
Use fancy indexing to take simple indices. Note that the all indices must be of the same shape and the shape of each index will be what is returned.
>>> ind = np.arange(2)
>>> A[ind,ind,ind]
array([0, 9]) #Index (0,0,0) and (1,1,1)
>>> ind = np.arange(2).reshape(2,1)
>>> A[ind,ind,ind]
array([[0],
[9]])
So for your example we need to supply the grid for the first two dimensions:
>>> A = np.random.random((4,4,3))
>>> B = np.int_(np.random.random((4,4))*3)
>>> A
array([[[ 0.95158697, 0.37643036, 0.29175815],
[ 0.84093397, 0.53453123, 0.64183715],
[ 0.31189496, 0.06281937, 0.10008886],
[ 0.79784114, 0.26428462, 0.87899921]],
[[ 0.04498205, 0.63823379, 0.48130828],
[ 0.93302194, 0.91964805, 0.05975115],
[ 0.55686047, 0.02692168, 0.31065731],
[ 0.92822499, 0.74771321, 0.03055592]],
[[ 0.24849139, 0.42819062, 0.14640117],
[ 0.92420031, 0.87483486, 0.51313695],
[ 0.68414428, 0.86867423, 0.96176415],
[ 0.98072548, 0.16939697, 0.19117458]],
[[ 0.71009607, 0.23057644, 0.80725518],
[ 0.01932983, 0.36680718, 0.46692839],
[ 0.51729835, 0.16073775, 0.77768313],
[ 0.8591955 , 0.81561797, 0.90633695]]])
>>> B
array([[1, 2, 0, 0],
[1, 2, 0, 1],
[2, 1, 1, 1],
[1, 2, 1, 2]])
>>> x,y = np.meshgrid(np.arange(A.shape[0]),np.arange(A.shape[1]))
>>> x
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
>>> y
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
>>> A[x,y,B]
array([[ 0.37643036, 0.48130828, 0.24849139, 0.71009607],
[ 0.53453123, 0.05975115, 0.92420031, 0.36680718],
[ 0.10008886, 0.02692168, 0.86867423, 0.16073775],
[ 0.26428462, 0.03055592, 0.16939697, 0.90633695]])
If you prefer to use mesh as suggested by Daniel, you may also use
A[tuple( np.ogrid[:A.shape[0], :A.shape[1]] + [B] )]
to work with sparse indices. In the general case you could use
A[tuple( np.ogrid[ [slice(0, end) for end in A.shape[:-1]] ] + [B] )]
Note that this may also be used when you'd like to index by B an axis different from the last one (see for example this answer about inserting an element into a list).
Otherwise you can do it using broadcasting:
A[np.arange(A.shape[0])[:, np.newaxis], np.arange(A.shape[1])[np.newaxis, :], B]
This may be generalized too but it's a bit more complicated.

numpy custom array element retrieval

I have a question regarding how to extract certain values from a 2D numpy array
Foo =
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
Bar =
array([[0, 0, 1],
[1, 2, 3]])
I want to extract elements from Foo using the values of Bar as indices, such that I end up with an 2D matrix/array Baz of the same shape as Bar. The ith column in Baz correspond is Foo[(np.array(each j in Bar[:,i]),np.array(i,i,i,i ...))]
Baz =
array([[ 1, 2, 6],
[ 4, 8, 12]])
I could do a couple nested for-loops but I was wondering if there is a more elegant, numpy-ish way to do this.
Sorry if this is a bit convoluted. Let me know if I need to explain further.
Thanks!
You can use Bar as the row index and an array [0, 1, 2] as the column index:
# for easy copy-pasting
import numpy as np
Foo = np.array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
Bar = np.array([[0, 0, 1], [1, 2, 3]])
# now use Bar as the `i` coordinate and 0, 1, 2 as the `j` coordinate:
Foo[Bar, [0, 1, 2]]
# array([[ 1, 2, 6],
# [ 4, 8, 12]])
# OR, to automatically generate the [0, 1, 2]
Foo[Bar, xrange(Bar.shape[1])]

Generating a boolean mask indexing one array into another array

It's hard to explain what I'm trying to do with words so here's an example.
Let's say we have the following inputs:
In [76]: x
Out[76]:
0 a
1 a
2 c
3 a
4 b
In [77]: z
Out[77]: ['a', 'b', 'c', 'd', 'e']
I want to get:
In [78]: ii
Out[78]:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0]])
ii is an array of boolean masks which can be applied to z to get back the original x.
My current solution is to write a function which converts z to a list and uses the index method to get the index of the element in z and then generate a row of zeroes except for the index where there is a one. This function gets applied to each row of x to get the desired result.
A first possibility:
>>> choices = np.diag([1]*5)
>>> choices[[z.index(i) for i in x]]
As noted elsewhere, you can change the list comprehension [z.index(i) for i in x] by np.searchsorted(z, x)
>>> choices[np.searchsorted(z, x)]
Note that as suggested in a comment by #seberg, you should use np.eye(len(x)) instead of np.diag([1]*len(x)). The np.eye function directly gives you a 2D array with 1 on the diagonal and 0 elsewhere.
This is numpy method for the case of z being sorted. You did not specifiy that... If pandas needs something differently, I don't know:
# Assuming z is sorted.
indices = np.searchsorted(z, x)
Now I really don't know why you want a boolean mask, these indices can be applied to z to give back x already and are more compact.
z[indices] == x # if z included all x.
Surprised no one mentioned theouter method of numpy.equal:
In [51]: np.equal.outer(s, z)
Out[51]:
array([[ True, False, False, False, False],
[ True, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False],
[False, True, False, False, False]], dtype=bool)
In [52]: np.equal.outer(s, z).astype(int)
Out[52]:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0]])