I have a numpy array A of size ((s1,...sm)) with integer entries and a dictionary D with integers as keys and numpy arrays of size ((t)) as values. I would like to evaluate the dictionary on every entry of the array A to get a new array B of size ((s1,...sm,t)).
For example
D={1:[0,1],2:[1,0]}
A=np.array([1,2,1])
The output shout be
array([[0,1],[1,0],[0,1]])
Motivation: I have an array with indexes of unit vectors as entries and I need to transform it into an array with the vectors as entries.
If you can rename your keys to be 0-indexed, you might use direct array querying on your unit vectors:
>>> units = np.array([D[1], D[2]])
>>> B = units[A - 1] # -1 because 0 indexed: 1 -> 0, 2 -> 1
>>> B
array([[0, 1],
[1, 0],
[0, 1]])
And similarly for any shape:
>>> A = np.random.random_integers(0, 1, (10, 11, 12))
>>> A.shape
(10, 11, 12)
>>> B = units[A]
>>> B.shape
(10, 11, 12, 2)
You can learn more about advanced indexing on the numpy doc
>>> np.asarray([D[key] for key in A])
array([[0, 1],
[1, 0],
[0, 1]])
Here's an approach using np.searchsorted to locate those row indices to index into the values of the dictionary and then simply indexing it to get the desired output, like so -
idx = np.searchsorted(D.keys(),A)
out = np.asarray(D.values())[idx]
Sample run -
In [45]: A
Out[45]: array([1, 2, 1])
In [46]: D
Out[46]: {1: [0, 1], 2: [1, 0]}
In [47]: idx = np.searchsorted(D.keys(),A)
...: out = np.asarray(D.values())[idx]
...:
In [48]: out
Out[48]:
array([[0, 1],
[1, 0],
[0, 1]])
Related
I want to use a 2D array which contains k-index values to quickly fill a 3D array with different mask values above/below each k-index. Only non-zero boundary indices will be used to fill.
Initialize 2D k-index array and extract valid i-j index arrays:
import numpy as np
boundary_indices = np.array([[0, 1, 2], [1, 2, 1], [0, 2, 0]])
ii, jj = np.where(boundary_indices > 0) # determine desired indices
kk = boundary_indices[ii, jj] # align boundary indices with valid indices
Yields:
boundary_indices = array([[0, 1, 2],
[1, 2, 1],
[0, 2, 0]])
ii = array([0, 0, 1, 1, 1, 2])
jj = array([1, 2, 0, 1, 2, 1])
kk = array([1, 2, 1, 2, 1, 2])
Loop through the indices and populate the output array:
output = np.zeros((3, 3, 3), dtype=np.int64)
for i, j, k in zip(ii, jj, kk):
output[i, j, :k] = 7 # fill region above
output[i, j, k:] = 8 # fill region below
While this does yield the correct results, it becomes quite slow once the size of the array increases significantly:
output[:, :, 0] = [[0, 7, 7],
[7, 7, 7],
[0, 7, 0]]
output[:, :, 1] = [[0, 8, 7],
[8, 7, 8],
[0, 7, 0]]
output[:, :, 2] = [[0, 8, 8],
[8, 8, 8],
[0, 8, 0]]
Is there a more efficient way to do this?
Tried output[ii, jj, kk] = 8 but that only imprints the boundary on the output array and not the regions above/below.
I was hoping that there would be some fancy-indexing magic and that something like this would work:
output[ii, jj, :kk] = 7
output[ii, jj, kk:] = 8
But it generates a TypeError: TypeError: only integer scalar arrays can be converted to a scalar index
For such kind of operation, Numba and Cython can be used to produce an efficient code. Here is an example with Numba:
import numba as nb
# `parallel=True` can be added here for large arrays
#nb.njit('int64[:,:,::1](int64[:], int64[:], int64[:])')
def compute(ii, jj, kk):
output = np.zeros((3, 3, 3), dtype=np.int64)
n = output.shape[2]
# `for idx in prange(ii.size)` can be used here for large array
for i, j, k in zip(ii, jj, kk):
# `i, j, k = ii[idx], jj[idx], kk[idx]` can be used here for large array
for l in range(k): # fill region above
output[i, j, l] = 7
for l in range(k, n): # fill region below
output[i, j, l] = 8
return output
# Either kk needs to be converted to an int64-based array with kk.astype(np.int64)
# or boundary_indices needs to be an int64-based array in the first place.
output = compute(ii, jj, kk)
Note that the Numba function can be faster if ii and jj are contiguous. However, they are surprisingly not contiguous when retrieved from np.where. Besides I assume that kk is a 64-bit array. You can change the signature (string in the Numba jit decorator) so to support 32-bit array. Also please note that Numba can lazily compile the function based on the provided type at runtime but this introduce a significant overhead during the first function call. This code is significantly faster, especially for large arrays thanks to the the just-in-time compilation of Numba. The Numba loop can be parallelized using prange and the parallel=True decorator flag although the current code should already be pretty good. Finally, note that you can do the operation np.where(boundary_indices > 0) directly in the Numba loop on the fly so to avoid creating possibly-expensive temporary arrays.
I have 2 2d numpy arrays A and B
I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])
Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True is faster than negating the output of isin, so you can do
A[np.isin(Av, Bv, invert=True)]
Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]
This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]
Could someone explain me why the second assertion below fails? I do not understand why using a slice or a range for indexing would make a difference in this case.
import numpy as np
d = np.zeros(shape = (1,2,3))
assert d[:, 0, slice(0,2)].shape == d[:, 0, range(0,2)].shape #This doesn't trigger an exception as both operands return (1,2)
assert d[0, :, slice(0,2)].shape == d[0, :, range(0,2)].shape #This does because (1,2) != (2,1)...
Make the array more diagnostic:
In [66]: d = np.arange(6).reshape(1,2,3)
In [67]: d
Out[67]:
array([[[0, 1, 2],
[3, 4, 5]]])
scalar index in the middle:
In [68]: d[:,0,:2]
Out[68]: array([[0, 1]])
In [69]: d[:,0,range(2)]
Out[69]: array([[0, 1]])
Shape is (1,2) for both, though the 2nd is a copy because of the advanced indexing of the last dimension.
Shape is the same in the 2nd set, but the order actually differs:
In [70]: d[0,:,:2]
Out[70]:
array([[0, 1],
[3, 4]])
In [71]: d[0,:,range(2)]
Out[71]:
array([[0, 3],
[1, 4]])
[71] is a case of mixed basic and advanced indexing, which is documented as doing the unexpected. The middle sliced dimension is put last.
https://numpy.org/doc/stable/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
I have a 2D array and a boolean mask of the same size. I want to use the mask to coalesce consecutive rows in the 2D array: By coalesce I mean to reduce the rows by taking the first occurrence. An example:
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
Expected output:
array([[0, 0],
[2, 2],
[3, 3],
[4, 4])
And to illustrate how the output might be obtained:
array([[0, 0], array([[0, 0], array([[0, 0],
[1, 1], [0, 0], [2, 2],
[2, 2], -> select -> [2, 2], -> reduce -> [3, 3],
[3, 3], [3, 3], [4, 4]])
[4, 4], [4, 4],
[5, 5]]) [4, 4]])
What I have tried:
rows[~mask].reshape(-1,2)
But this will only select the rows which should not be reduced.
Upgraded answer
I realized that my initial submission did a lot of unnecessary operations, I realized that given mask
mask = [1,1,0,0,1,1,0,0,1,1,1,0]
You simply want to negate the leading ones:
#negate:v v v
mask = [0,1,0,0,0,1,0,0,0,1,1,0]
then negate the mask to get your wanted rows. This way is MUCH more efficient than doing a forward fill on indices and removing repeated indices (see old answer). Revised solution:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
mask[1:] = mask[1:] & mask[:-1] # Negate leading True values
mask[0] = False # First element should always be False, either it is False anyways, or it is a leading True value (which should be set to False)
return a[~mask] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Old answer
Here I assume that you only need complete rows (like in #Arne's answer). My idea is that given the mask and the corresponding array indices
mask = [1,1,0,0,1,1]
indices = [0,1,2,3,4,5]
you can use np.diff to first obtain
indices = [0,-1,2,3,4,-1]
Then a forward fill (where -1 acts as nan) on the indices such that you get
[0,0,2,3,4,4]
of which can use np.unique to remove repeated indices:
[0,2,3,4] # The rows indices you want
Code:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
indices = np.arange(len(a))
mask[np.diff(mask,prepend=[0]) == 1] = False # set leading True to False
indices[mask] = -1
indices = np.maximum.accumulate(indices) # forward fill indices
indices = np.unique(indices) # remove repeats
return a[indices] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Assuming it's always about complete rows, you can reduce the mask to one dimension. Then a straightforward approach is to iterate over the rows:
# reduce mask to one dimension for row selection
mask_1d = mask.any(axis=1)
# replace rows with previous ones based on mask
for i in range(1, len(rows)):
if mask_1d[i-1] and mask_1d[i]:
rows[i] = rows[i-1]
# leave out repeated rows
reduced = [rows[0]]
for i in range(1, len(rows)):
if not (rows[i] == rows[i-1]).all():
reduced.append(rows[i])
reduced = np.array(reduced)
reduced
array([[0, 0],
[2, 2],
[3, 3],
[4, 4]])
val = np.array([[1, 3], [2, 5], [0, 6], [1, 2] ])
print(np.max(val))
6
I also want to print the row [0,6]. with axis it returns all the value from other rows as well. argmax doesnt return the row index.
One way is to use np.where which return indexes where true:
r,_ = np.where(val == np.max(val))
val[r]
Output:
array([[0, 6]])