Select elements of a numpy array based on the elements of a second array - numpy

Consider a numpy array A of shape (7,6)
A = array([[0, 1, 2, 3, 5, 8],
[4, 100, 6, 7, 8, 7],
[8, 9, 10, 11, 5, 4],
[12, 13, 14, 15, 1, 2],
[1, 3, 5, 6, 4, 8],
[12, 23, 12, 24, 4, 3],
[1, 3, 5, 7, 89, 0]])
together with a second numpy array r of the same shape which contains the radius of A starting from a central point A(3,2)=0:
r = array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
I would like to pick up all the elements of A which are located at the position 1 of r, i.e. [9,10,11,15,4,6,5,13], all the elements of A located at position 2 of r and so on. I there some numpy function to do that?
Thank you

You can select a section of A by doing something like A[r == 1], to get all the sections as a list you could do [A[r == i] for i in range(r.max() + 1)]. This will work, but may be inefficient depending on how big the values in r go because you need to compute r == i for every i.
You could also use this trick, first sort A based on r, then simply split the sorted A array at the right places. That looks something like this:
r_flat = r.ravel()
order = r_flat.argsort()
A_sorted = A.ravel()[order]
r_sorted = r_flat[order]
edges = r_sorted.searchsorted(np.arange(r_sorted[-1] + 1), 'right')
sections = []
start = 0
for end in edges:
sections.append(A_sorted[start:end])
start = end

I get a different answer to the one you were expecting (3 not 4 from the 4th row) and the order is slightly different (strictly row then column), but:
>>> A
array([[ 0, 1, 2, 3, 5, 8],
[ 4, 100, 6, 7, 8, 7],
[ 8, 9, 10, 11, 5, 4],
[ 12, 13, 14, 15, 1, 2],
[ 1, 3, 5, 6, 4, 8],
[ 12, 23, 12, 24, 4, 3],
[ 1, 3, 5, 7, 89, 0]])
>>> r
array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
>>> A[r==1]
array([ 9, 10, 11, 13, 15, 3, 5, 6])
Alternatively, you can get column then row ordering by transposing both arrays:
>>> A.T[r.T==1]
array([ 9, 13, 3, 10, 5, 11, 15, 6])

Related

Numpy array value change via two index sets

I am trying to achieve the following:
# Before
raw = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Set values to 10
indice_set1 = np.array([0, 2, 4])
indice_set2 = np.array([0, 1])
raw[indice_set1][indice_set2] = 10
# Result
print(raw)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But the raw values remain exactly the same.
Expecting this:
# After
raw = np.array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])
After doing raw[indice_set1] you get a new array, which is the one you modify with the second slicing, not raw.
Instead, slice the slicer:
raw[indice_set1[indice_set2]] = 10
Modified raw:
array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])

numpy / pandas array comparison with multiple values in other array

I have an array
a = np.arange(0, 100)
and another array with some cut-off points
b = np.array([5, 8, 15, 35, 76])
I want to create an array such that
c = [0, 0, 0, 0, 1, 1, 1, 2, 2, ..., 4, 4, 5]
Is there an elegant / fast way to do this? Possible in Pandas?
Here's a compact way -
(a[:,None]>=b).sum(1)
Another with cumsum -
p = np.zeros(len(a),dtype=int)
p[b] = 1
out = p.cumsum()
Another with searchsorted -
np.searchsorted(b,a,'right')
Another with repeat -
np.repeat(range(len(b)+1),np.ediff1d(b,to_begin=b[0],to_end=len(a)-b[-1]))
Another with isin and cumsum -
np.isin(a,b).cumsum()
Here is one way cut
pd.cut(a,[-np.Inf]+b.tolist()+[np.Inf]).codes
Out[383]:
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], dtype=int8)

Numpy: Fast approach to extract rows from an array using an 2D index array

I have 2 arrays a and b:
N,D,V,W = 2,3,4,5
a = np.random.randint(0,V,N*D).reshape(N,D)
a
array([[2, 3, 3],
[2, 0, 3]])
b = np.random.randint(0,10,V*W).reshape(V,W)
b
array([[0, 1, 0, 5, 5],
[0, 3, 6, 8, 7],
[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1]])
What I need to do is to replace every element of array a with a row from array b using the array a element value as the row index of array b.
At the moment I'm doing it this way which works fine:
b[a.ravel(),:].reshape(*a.shape,-1)
array([[[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1],
[4, 6, 3, 3, 1]],
[[8, 8, 9, 0, 9],
[0, 1, 0, 5, 5],
[4, 6, 3, 3, 1]]])
However it seems this approach is a bit slow.
I tested it with:
N,D,V,W = 20000,64,100,256
and it took an average of 674ms on my laptop(8 core, 16 ram)
Can someone please recommend an faster yet still simple approach?

Splitting a number and assigning to elements in a row in a numpy array

How to place a list of numbers in to a 2D numpy array, where the second dimension of the array is equal to the number of digits of the largest number of that list? I also want the elements that don't belong to the original number to be zero in each row of the returning array.
Example:
From the list a = range(0,1001), how to get the numpy array of the below form:
[[0,0,0,0],
[0,0,0,1],
[0,0,0,2],
...
[0,9,9,8]
[0,9,9,9],
[1,0,0,0]]
Please note how the each number is placed in-place in a np.zeros((1000,4)) array at the end of the each row.
NB: A pythonic, vectorized implementation is expected
Broadcasting again!
def split_digits(a):
N = int(np.log10(np.max(a))+1) # No. of digits
r = 10**np.arange(N,-1,-1) # 10-powered range array
return (np.asarray(a)[:,None]%r[:-1])//r[1:]
Sample runs -
In [224]: a = range(0,1001)
In [225]: split_digits(a)
Out[225]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[0, 9, 9, 8],
[0, 9, 9, 9],
[1, 0, 0, 0]])
In [229]: a = np.random.randint(0,1000000,(7))
In [230]: a
Out[230]: array([431921, 871855, 636144, 541186, 410562, 89356, 476258])
In [231]: split_digits(a)
Out[231]:
array([[4, 3, 1, 9, 2, 1],
[8, 7, 1, 8, 5, 5],
[6, 3, 6, 1, 4, 4],
[5, 4, 1, 1, 8, 6],
[4, 1, 0, 5, 6, 2],
[0, 8, 9, 3, 5, 6],
[4, 7, 6, 2, 5, 8]])
Another concept using pandas str
def pir(a):
z = int(np.log10(np.max(a)))
s = pd.Series(a.astype(str))
zfilled = s.str.zfill(z + 1).sum()
a_ = np.array(list(zfilled)).reshape(-1, z + 1)
return a_.astype(int)
Using #Divakar's random array
a = np.random.randint(0,1000000,(7))
array([ 57190, 29950, 392317, 592062, 460333, 639794, 983647])
pir(a)
array([[0, 5, 7, 1, 9, 0],
[0, 2, 9, 9, 5, 0],
[3, 9, 2, 3, 1, 7],
[5, 9, 2, 0, 6, 2],
[4, 6, 0, 3, 3, 3],
[6, 3, 9, 7, 9, 4],
[9, 8, 3, 6, 4, 7]])

Convert string to integer pandas dataframe index

I have a pandas dataframe with a multiindex. Unfortunately one of the indices gives years as a string
e.g. '2010', '2011'
how do I convert these to integers?
More concretely
MultiIndex(levels=[[u'2010', u'2011'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, , ...]], names=[u'Year', u'Month'])
.
df_cbs_prelim_total.index.set_levels(df_cbs_prelim_total.index.get_level_values(0).astype('int'))
seems to do it, but not inplace. Any proper way of changing them?
Cheers,
Mike
Will probably be cleaner to do this before you assign it as index (as #EdChum points out), but when you already have it as index, you can indeed use set_levels to alter one of the labels of a level of your multi-index. A bit cleaner as your code (you can use index.levels[..]):
In [165]: idx = pd.MultiIndex.from_product([[1,2,3], ['2011','2012','2013']])
In [166]: idx
Out[166]:
MultiIndex(levels=[[1, 2, 3], [u'2011', u'2012', u'2013']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [167]: idx.levels[1]
Out[167]: Index([u'2011', u'2012', u'2013'], dtype='object')
In [168]: idx = idx.set_levels(idx.levels[1].astype(int), level=1)
In [169]: idx
Out[169]:
MultiIndex(levels=[[1, 2, 3], [2011, 2012, 2013]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
You have to reassign it to save the changes (as is done above, in your case this would be df_cbs_prelim_total.index = df_cbs_prelim_total.index.set_levels(...))