numpy custom array element retrieval - numpy

I have a question regarding how to extract certain values from a 2D numpy array
Foo =
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
Bar =
array([[0, 0, 1],
[1, 2, 3]])
I want to extract elements from Foo using the values of Bar as indices, such that I end up with an 2D matrix/array Baz of the same shape as Bar. The ith column in Baz correspond is Foo[(np.array(each j in Bar[:,i]),np.array(i,i,i,i ...))]
Baz =
array([[ 1, 2, 6],
[ 4, 8, 12]])
I could do a couple nested for-loops but I was wondering if there is a more elegant, numpy-ish way to do this.
Sorry if this is a bit convoluted. Let me know if I need to explain further.
Thanks!

You can use Bar as the row index and an array [0, 1, 2] as the column index:
# for easy copy-pasting
import numpy as np
Foo = np.array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
Bar = np.array([[0, 0, 1], [1, 2, 3]])
# now use Bar as the `i` coordinate and 0, 1, 2 as the `j` coordinate:
Foo[Bar, [0, 1, 2]]
# array([[ 1, 2, 6],
# [ 4, 8, 12]])
# OR, to automatically generate the [0, 1, 2]
Foo[Bar, xrange(Bar.shape[1])]

Related

Numpy Vectorization: add row above to current row on ndarray

I would like to add the values in the above row to the row below using vectorization. For example, if I had the ndarray,
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]]
Then after one iteration through this method, it would result in
[[0, 0, 0, 0],
[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]]
One can simply do this with a for loop:
import numpy as np
def addAboveRow(arr):
cpy = arr.copy()
r, c = arr.shape
for i in range(1, r):
for j in range(c):
cpy[i][j] += arr[i - 1][j]
return cpy
ndarr = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]).reshape(4, 4)
print(addAboveRow(ndarr))
I'm not sure how to approach this using vectorization though. I think slicers should be used? Also, I'm not really sure how to deal with the issue of the top border, because nothing should be added onto the first row. Any help would be appreciated. Thanks!
Note: I am really new to vectorization so an explanation would be great!
You can use indexing directly:
b = np.zeros_like(a)
b[0] = a[0]
b[1:] = a[1:] + a[:-1]
>>> b
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]])
An alternative:
b = a.copy()
b[1:] += a[:-1]
Or:
b = a.copy()
np.add(b[1:], a[:-1], out=b[1:])
You could try the following
np.put(arr, np.arange(arr.shape[1], arr.size), arr[1:]+arr[:-1])

Numpy: Fast approach to extract rows from an array using an 2D index array

I have 2 arrays a and b:
N,D,V,W = 2,3,4,5
a = np.random.randint(0,V,N*D).reshape(N,D)
a
array([[2, 3, 3],
[2, 0, 3]])
b = np.random.randint(0,10,V*W).reshape(V,W)
b
array([[0, 1, 0, 5, 5],
[0, 3, 6, 8, 7],
[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1]])
What I need to do is to replace every element of array a with a row from array b using the array a element value as the row index of array b.
At the moment I'm doing it this way which works fine:
b[a.ravel(),:].reshape(*a.shape,-1)
array([[[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1],
[4, 6, 3, 3, 1]],
[[8, 8, 9, 0, 9],
[0, 1, 0, 5, 5],
[4, 6, 3, 3, 1]]])
However it seems this approach is a bit slow.
I tested it with:
N,D,V,W = 20000,64,100,256
and it took an average of 674ms on my laptop(8 core, 16 ram)
Can someone please recommend an faster yet still simple approach?

numpy: get indices where condition holds per row

I have an array such as the following:
In [70]: x
Out[70]:
array([[0, 1, 2],
[3, 4, 5]])
I am trying to get the indices per row where a condition holds, for example, x > 1.
Expected output is like ([2], [0, 1, 2])
I have tried numpy.where, numpy.nonzero, but they give strange results.
One approach -
r,c = np.where(x>1)
out = np.split(c, np.flatnonzero(r[1:] > r[:-1])+1)
Sample run -
In [140]: x
Out[140]:
array([[0, 2, 0, 1, 1],
[2, 2, 1, 2, 0],
[0, 2, 1, 1, 0],
[1, 0, 0, 2, 2]])
In [141]: r,c = np.where(x>1)
In [142]: np.split(c, np.flatnonzero(r[1:] > r[:-1])+1)
Out[142]: [array([1]), array([0, 1, 3]), array([1]), array([3, 4])]
Alternatively, we could use np.unique on the final step, like so -
np.split(c, np.unique(r, return_index=1)[1][1:])

Splitting a number and assigning to elements in a row in a numpy array

How to place a list of numbers in to a 2D numpy array, where the second dimension of the array is equal to the number of digits of the largest number of that list? I also want the elements that don't belong to the original number to be zero in each row of the returning array.
Example:
From the list a = range(0,1001), how to get the numpy array of the below form:
[[0,0,0,0],
[0,0,0,1],
[0,0,0,2],
...
[0,9,9,8]
[0,9,9,9],
[1,0,0,0]]
Please note how the each number is placed in-place in a np.zeros((1000,4)) array at the end of the each row.
NB: A pythonic, vectorized implementation is expected
Broadcasting again!
def split_digits(a):
N = int(np.log10(np.max(a))+1) # No. of digits
r = 10**np.arange(N,-1,-1) # 10-powered range array
return (np.asarray(a)[:,None]%r[:-1])//r[1:]
Sample runs -
In [224]: a = range(0,1001)
In [225]: split_digits(a)
Out[225]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[0, 9, 9, 8],
[0, 9, 9, 9],
[1, 0, 0, 0]])
In [229]: a = np.random.randint(0,1000000,(7))
In [230]: a
Out[230]: array([431921, 871855, 636144, 541186, 410562, 89356, 476258])
In [231]: split_digits(a)
Out[231]:
array([[4, 3, 1, 9, 2, 1],
[8, 7, 1, 8, 5, 5],
[6, 3, 6, 1, 4, 4],
[5, 4, 1, 1, 8, 6],
[4, 1, 0, 5, 6, 2],
[0, 8, 9, 3, 5, 6],
[4, 7, 6, 2, 5, 8]])
Another concept using pandas str
def pir(a):
z = int(np.log10(np.max(a)))
s = pd.Series(a.astype(str))
zfilled = s.str.zfill(z + 1).sum()
a_ = np.array(list(zfilled)).reshape(-1, z + 1)
return a_.astype(int)
Using #Divakar's random array
a = np.random.randint(0,1000000,(7))
array([ 57190, 29950, 392317, 592062, 460333, 639794, 983647])
pir(a)
array([[0, 5, 7, 1, 9, 0],
[0, 2, 9, 9, 5, 0],
[3, 9, 2, 3, 1, 7],
[5, 9, 2, 0, 6, 2],
[4, 6, 0, 3, 3, 3],
[6, 3, 9, 7, 9, 4],
[9, 8, 3, 6, 4, 7]])

Reduce a dimension of numpy array by selecting

I have a 3d array
A = np.random.random((4,4,3))
and a index matrix
B = np.int_(np.random.random((4,4))*3)
How do I get a 2D array from A based on index matrix B?
In general, how to get a N-1 dimensional array from a ND array and a N-1 dimensional index array?
Lets take an example:
>>> A = np.random.randint(0,10,(3,3,2))
>>> A
array([[[0, 1],
[8, 2],
[6, 4]],
[[1, 0],
[6, 9],
[7, 7]],
[[1, 2],
[2, 2],
[9, 7]]])
Use fancy indexing to take simple indices. Note that the all indices must be of the same shape and the shape of each index will be what is returned.
>>> ind = np.arange(2)
>>> A[ind,ind,ind]
array([0, 9]) #Index (0,0,0) and (1,1,1)
>>> ind = np.arange(2).reshape(2,1)
>>> A[ind,ind,ind]
array([[0],
[9]])
So for your example we need to supply the grid for the first two dimensions:
>>> A = np.random.random((4,4,3))
>>> B = np.int_(np.random.random((4,4))*3)
>>> A
array([[[ 0.95158697, 0.37643036, 0.29175815],
[ 0.84093397, 0.53453123, 0.64183715],
[ 0.31189496, 0.06281937, 0.10008886],
[ 0.79784114, 0.26428462, 0.87899921]],
[[ 0.04498205, 0.63823379, 0.48130828],
[ 0.93302194, 0.91964805, 0.05975115],
[ 0.55686047, 0.02692168, 0.31065731],
[ 0.92822499, 0.74771321, 0.03055592]],
[[ 0.24849139, 0.42819062, 0.14640117],
[ 0.92420031, 0.87483486, 0.51313695],
[ 0.68414428, 0.86867423, 0.96176415],
[ 0.98072548, 0.16939697, 0.19117458]],
[[ 0.71009607, 0.23057644, 0.80725518],
[ 0.01932983, 0.36680718, 0.46692839],
[ 0.51729835, 0.16073775, 0.77768313],
[ 0.8591955 , 0.81561797, 0.90633695]]])
>>> B
array([[1, 2, 0, 0],
[1, 2, 0, 1],
[2, 1, 1, 1],
[1, 2, 1, 2]])
>>> x,y = np.meshgrid(np.arange(A.shape[0]),np.arange(A.shape[1]))
>>> x
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
>>> y
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
>>> A[x,y,B]
array([[ 0.37643036, 0.48130828, 0.24849139, 0.71009607],
[ 0.53453123, 0.05975115, 0.92420031, 0.36680718],
[ 0.10008886, 0.02692168, 0.86867423, 0.16073775],
[ 0.26428462, 0.03055592, 0.16939697, 0.90633695]])
If you prefer to use mesh as suggested by Daniel, you may also use
A[tuple( np.ogrid[:A.shape[0], :A.shape[1]] + [B] )]
to work with sparse indices. In the general case you could use
A[tuple( np.ogrid[ [slice(0, end) for end in A.shape[:-1]] ] + [B] )]
Note that this may also be used when you'd like to index by B an axis different from the last one (see for example this answer about inserting an element into a list).
Otherwise you can do it using broadcasting:
A[np.arange(A.shape[0])[:, np.newaxis], np.arange(A.shape[1])[np.newaxis, :], B]
This may be generalized too but it's a bit more complicated.