NumPy how to reshape when some data is missing? - numpy

With the following source data -
In [53]: source_data = np.array([
...: [0, 0, 0, 10],
...: [0, 0, 1, 11],
...: [0, 1, 0, 12],
...: [0, 1, 1, 13],
...: [1, 0, 0, 14],
...: [1, 0, 1, 15],
...: [1, 1, 0, 16],
...: [1, 1, 1, 17]
...: ])
I can reshape as follows to make indexing more convenient -
In [62]: max = np.max(source_data, axis=0).astype(int)
In [63]: max
Out[63]: array([ 1, 1, 1, 17])
In [64]: three_d = np.ravel(source_data[:,3]).reshape((max[0]+1, max[1]+1, max[2]+1))
In [65]: three_d
Out[65]:
array([[[10, 11],
[12, 13]],
[[14, 15],
[16, 17]]])
but in case there are rows missing from the source data, for example -
In [68]: source_data2 = np.array([
...: [0, 0, 0, 10],
...: [0, 0, 1, 11],
...: [0, 1, 1, 13],
...: [1, 1, 0, 16],
...: [1, 1, 1, 17]
...: ])
what is the most efficient way to transform it to the following?
array([[[10, 11],
[nan, 13]],
[[nan, nan],
[16, 17]]])

In [512]: source_data = np.array([
...: ...: [0, 0, 0, 10],
...: ...: [0, 0, 1, 11],
...: ...: [0, 1, 0, 12],
...: ...: [0, 1, 1, 13],
...: ...: [1, 0, 0, 14],
...: ...: [1, 0, 1, 15],
...: ...: [1, 1, 0, 16],
...: ...: [1, 1, 1, 17]
...: ...: ])
The reshape works because the source_data is complete and in order; you are ignoring the coordinates in the first 3 columns.
But we can use them with:
In [513]: arr = np.zeros((2,2,2), int)
In [514]: arr[source_data[:,0], source_data[:,1], source_data[:,2]] = source_data[:,3]
In [515]: arr
Out[515]:
array([[[10, 11],
[12, 13]],
[[14, 15],
[16, 17]]])
We can do the same with the next source:
In [516]: source_data2 = np.array([
...: ...: [0, 0, 0, 10],
...: ...: [0, 0, 1, 11],
...: ...: [0, 1, 1, 13],
...: ...: [1, 1, 0, 16],
...: ...: [1, 1, 1, 17]
...: ...: ])
fill the target with the nan:
In [517]: arr = np.full((2,2,2), np.nan)
In [518]: arr
Out[518]:
array([[[nan, nan],
[nan, nan]],
[[nan, nan],
[nan, nan]]])
In [519]: arr[source_data2[:,0], source_data2[:,1], source_data2[:,2]] = source_data2[:,3]
In [520]: arr
Out[520]:
array([[[10., 11.],
[nan, 13.]],
[[nan, nan],
[16., 17.]]])

Related

How to iterate through slices at the last dimension

For example, you have array
a = np.array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
We want to iterate through slices at the last dimension, i.e. [0,1,2], [3,4,5], [6,7,8], [9,10,11]. Any way to achieve this without the for loop? Thanks!
Tried this but it does not work, because numpy does not interpret the tuple in the way we wanted - a[(0, 0),:] is not the same as a[0, 0, :]
[a[i,:] for i in zip(*product(*(range(ii) for ii in a.shape[:-1])))]
More generally, any way for the last k dimensions? Something equivalent to looping through a[i,j,k, ...].
In [26]: a = np.array([[[ 0, 1, 2],
...: [ 3, 4, 5]],
...:
...: [[ 6, 7, 8],
...: [ 9, 10, 11]]])
In [27]: [a[i,j,:] for i in range(2) for j in range(2)]
Out[27]: [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11])]
or
In [31]: list(np.ndindex(2,2))
Out[31]: [(0, 0), (0, 1), (1, 0), (1, 1)]
In [32]: [a[i,j] for i,j in np.ndindex(2,2)]
another
list(a.reshape(-1,3))

Swap values and indexes in numpy

I wonder if this is possible, so I have two 2D arrays:
X[7][9] = 10
Y[7][9] = 5
From above info I want to create following two 2D arrays:
X'[5][10] = 9
Y'[5][10] = 7
Is it possible to accomplish this? Values of X and Y are bounded and won't exceed shape of X and Y. Also X and Y has the same shape.
thanks in advance.
You should be able to use np.nditer to keep track of the multi-index and the corresponding values of the arrays.
rng = np.random.RandomState(0)
X = rng.randint(low=0, high=10, size=(10, 10))
Y = rng.randint(low=0, high=10, size=(10, 10))
X_prime = X.copy()
Y_prime = Y.copy()
it = np.nditer([X, Y], flags=['multi_index'])
for x, y in it:
i, j = it.multi_index
X_prime[y, x] = j
Y_prime[y, x] = i
I believe this is the result you were expecting:
>>> X[7, 9], Y[7, 9]
(3, 9)
>>> X_prime[9, 3], Y_prime[9, 3]
(9, 7)
>>> X[1, 2], Y[1, 2]
(8, 2)
>>> X_prime[2, 8], Y_prime[2, 8]
(2, 1)
In [147]: X = np.random.randint(0,5,(5,5))
In [148]: Y = np.random.randint(0,5,(5,5))
Similar to Matt's answer, but using ndindex to generate the indices. There are various ways of generating all such values. Internally I believe ndindex uses nditer:
In [149]: X_,Y_ = np.zeros_like(X)-1,np.zeros_like(Y)-1
In [150]: for i,j in np.ndindex(*X.shape):
...: k,l = X[i,j], Y[i,j]
...: X_[k,l] = i
...: Y_[k,l] = j
...:
In [151]: X
Out[151]:
array([[2, 4, 3, 4, 2],
[0, 3, 0, 2, 3],
[1, 1, 4, 4, 4],
[2, 1, 2, 2, 0],
[0, 1, 0, 1, 4]])
In [152]: Y
Out[152]:
array([[1, 2, 1, 3, 0],
[4, 2, 4, 0, 4],
[4, 3, 3, 2, 1],
[0, 3, 0, 2, 2],
[1, 4, 2, 0, 0]])
In [153]: X_
Out[153]:
array([[-1, 4, 4, -1, 1],
[ 4, -1, -1, 3, 4],
[ 3, 0, 3, -1, -1],
[-1, 0, 1, -1, 1],
[ 4, 2, 2, 2, -1]])
In [154]: Y_
Out[154]:
array([[-1, 0, 2, -1, 2],
[ 3, -1, -1, 1, 1],
[ 2, 0, 3, -1, -1],
[-1, 2, 1, -1, 4],
[ 4, 4, 3, 2, -1]])
Notice that with randomly generated arrays, the mapping is not full (the -1 values). And if there are duplicates, the last replaces previous values.
Handling duplicates - note the change in X_:
In [156]: for i,j in np.ndindex(*X.shape):
...: k,l = X[i,j], Y[i,j]
...: if X_[k,l]==-1:
...: X_[k,l] = i
...: Y_[k,l] = j
...: else:
...: X_[k,l] += i
...: Y_[k,l] += j
...:
...:
In [157]: X_
Out[157]:
array([[-1, 4, 7, -1, 2],
[ 4, -1, -1, 5, 6],
[ 7, 0, 3, -1, -1],
[-1, 0, 1, -1, 1],
[ 4, 2, 2, 2, -1]])
If the mapping is complete and one to one, it might be possible to do this mapping in a whole-array non-iterative fashion, which would be faster than this.

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

Indexing numpy array using another numpy array [duplicate]

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Extract blocks or patches from NumPy Array

I have a 2-d numpy array as follows:
a = np.array([[1,5,9,13],
[2,6,10,14],
[3,7,11,15],
[4,8,12,16]]
I want to extract it into patches of 2 by 2 sizes with out repeating the elements.
The answer should exactly be the same. This can be 3-d array or list with the same order of elements as below:
[[[1,5],
[2,6]],
[[3,7],
[4,8]],
[[9,13],
[10,14]],
[[11,15],
[12,16]]]
How can do it easily?
In my real problem the size of a is (36, 72). I can not do it one by one. I want programmatic way of doing it.
Using scikit-image:
import numpy as np
from skimage.util import view_as_blocks
a = np.array([[1,5,9,13],
[2,6,10,14],
[3,7,11,15],
[4,8,12,16]])
print(view_as_blocks(a, (2, 2)))
You can achieve it with a combination of np.reshape and np.swapaxes like so -
def extract_blocks(a, blocksize, keep_as_view=False):
M,N = a.shape
b0, b1 = blocksize
if keep_as_view==0:
return a.reshape(M//b0,b0,N//b1,b1).swapaxes(1,2).reshape(-1,b0,b1)
else:
return a.reshape(M//b0,b0,N//b1,b1).swapaxes(1,2)
As can be seen there are two ways to use it - With keep_as_view flag turned off (default one) or on. With keep_as_view = False, we are reshaping the swapped-axes to a final output of 3D, while with keep_as_view = True, we will keep it 4D and that will be a view into the input array and hence, virtually free on runtime. We will verify it with a sample case run later on.
Sample cases
Let's use a sample input array, like so -
In [94]: a
Out[94]:
array([[2, 2, 6, 1, 3, 6],
[1, 0, 1, 0, 0, 3],
[4, 0, 0, 4, 1, 7],
[3, 2, 4, 7, 2, 4],
[8, 0, 7, 3, 4, 6],
[1, 5, 6, 2, 1, 8]])
Now, let's use some block-sizes for testing. Let's use a blocksize of (2,3) with the view-flag turned off and on -
In [95]: extract_blocks(a, (2,3)) # Blocksize : (2,3)
Out[95]:
array([[[2, 2, 6],
[1, 0, 1]],
[[1, 3, 6],
[0, 0, 3]],
[[4, 0, 0],
[3, 2, 4]],
[[4, 1, 7],
[7, 2, 4]],
[[8, 0, 7],
[1, 5, 6]],
[[3, 4, 6],
[2, 1, 8]]])
In [48]: extract_blocks(a, (2,3), keep_as_view=True)
Out[48]:
array([[[[2, 2, 6],
[1, 0, 1]],
[[1, 3, 6],
[0, 0, 3]]],
[[[4, 0, 0],
[3, 2, 4]],
[[4, 1, 7],
[7, 2, 4]]],
[[[8, 0, 7],
[1, 5, 6]],
[[3, 4, 6],
[2, 1, 8]]]])
Verify view with keep_as_view=True
In [20]: np.shares_memory(a, extract_blocks(a, (2,3), keep_as_view=True))
Out[20]: True
Let's check out performance on a large array and verify the virtually free runtime claim as discussed earlier -
In [42]: a = np.random.rand(2000,3000)
In [43]: %timeit extract_blocks(a, (2,3), keep_as_view=True)
1000000 loops, best of 3: 801 ns per loop
In [44]: %timeit extract_blocks(a, (2,3), keep_as_view=False)
10 loops, best of 3: 29.1 ms per loop
Here's a rather cryptic numpy one-liner to generate your 3-d array, called result1 here:
In [60]: x
Out[60]:
array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
[3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
[2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
[0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
[0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
[1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
[2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
[0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])
In [61]: result1 = x.reshape(x.shape[0]//2, 2, x.shape[1]//2, 2).swapaxes(1, 2).reshape(-1, 2, 2)
result1 is like a 1-d array of 2-d arrays:
In [68]: result1.shape
Out[68]: (20, 2, 2)
In [69]: result1[0]
Out[69]:
array([[2, 1],
[3, 1]])
In [70]: result1[1]
Out[70]:
array([[2, 2],
[2, 1]])
In [71]: result1[5]
Out[71]:
array([[2, 0],
[0, 1]])
In [72]: result1[-1]
Out[72]:
array([[1, 2],
[2, 1]])
(Sorry, I don't have time at the moment to give a detailed breakdown of how it works. Maybe later...)
Here's a less cryptic version that uses a nested list comprehension. In this case, result2 is a python list of 2-d numpy arrays:
In [73]: result2 = [x[2*j:2*j+2, 2*k:2*k+2] for j in range(x.shape[0]//2) for k in range(x.shape[1]//2)]
In [74]: result2[5]
Out[74]:
array([[2, 0],
[0, 1]])
In [75]: result2[-1]
Out[75]:
array([[1, 2],
[2, 1]])