Splitting a number and assigning to elements in a row in a numpy array - numpy

How to place a list of numbers in to a 2D numpy array, where the second dimension of the array is equal to the number of digits of the largest number of that list? I also want the elements that don't belong to the original number to be zero in each row of the returning array.
Example:
From the list a = range(0,1001), how to get the numpy array of the below form:
[[0,0,0,0],
[0,0,0,1],
[0,0,0,2],
...
[0,9,9,8]
[0,9,9,9],
[1,0,0,0]]
Please note how the each number is placed in-place in a np.zeros((1000,4)) array at the end of the each row.
NB: A pythonic, vectorized implementation is expected

Broadcasting again!
def split_digits(a):
N = int(np.log10(np.max(a))+1) # No. of digits
r = 10**np.arange(N,-1,-1) # 10-powered range array
return (np.asarray(a)[:,None]%r[:-1])//r[1:]
Sample runs -
In [224]: a = range(0,1001)
In [225]: split_digits(a)
Out[225]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[0, 9, 9, 8],
[0, 9, 9, 9],
[1, 0, 0, 0]])
In [229]: a = np.random.randint(0,1000000,(7))
In [230]: a
Out[230]: array([431921, 871855, 636144, 541186, 410562, 89356, 476258])
In [231]: split_digits(a)
Out[231]:
array([[4, 3, 1, 9, 2, 1],
[8, 7, 1, 8, 5, 5],
[6, 3, 6, 1, 4, 4],
[5, 4, 1, 1, 8, 6],
[4, 1, 0, 5, 6, 2],
[0, 8, 9, 3, 5, 6],
[4, 7, 6, 2, 5, 8]])

Another concept using pandas str
def pir(a):
z = int(np.log10(np.max(a)))
s = pd.Series(a.astype(str))
zfilled = s.str.zfill(z + 1).sum()
a_ = np.array(list(zfilled)).reshape(-1, z + 1)
return a_.astype(int)
Using #Divakar's random array
a = np.random.randint(0,1000000,(7))
array([ 57190, 29950, 392317, 592062, 460333, 639794, 983647])
pir(a)
array([[0, 5, 7, 1, 9, 0],
[0, 2, 9, 9, 5, 0],
[3, 9, 2, 3, 1, 7],
[5, 9, 2, 0, 6, 2],
[4, 6, 0, 3, 3, 3],
[6, 3, 9, 7, 9, 4],
[9, 8, 3, 6, 4, 7]])

Related

Numpy Vectorization: add row above to current row on ndarray

I would like to add the values in the above row to the row below using vectorization. For example, if I had the ndarray,
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]]
Then after one iteration through this method, it would result in
[[0, 0, 0, 0],
[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]]
One can simply do this with a for loop:
import numpy as np
def addAboveRow(arr):
cpy = arr.copy()
r, c = arr.shape
for i in range(1, r):
for j in range(c):
cpy[i][j] += arr[i - 1][j]
return cpy
ndarr = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]).reshape(4, 4)
print(addAboveRow(ndarr))
I'm not sure how to approach this using vectorization though. I think slicers should be used? Also, I'm not really sure how to deal with the issue of the top border, because nothing should be added onto the first row. Any help would be appreciated. Thanks!
Note: I am really new to vectorization so an explanation would be great!
You can use indexing directly:
b = np.zeros_like(a)
b[0] = a[0]
b[1:] = a[1:] + a[:-1]
>>> b
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]])
An alternative:
b = a.copy()
b[1:] += a[:-1]
Or:
b = a.copy()
np.add(b[1:], a[:-1], out=b[1:])
You could try the following
np.put(arr, np.arange(arr.shape[1], arr.size), arr[1:]+arr[:-1])

Numpy: Fast approach to extract rows from an array using an 2D index array

I have 2 arrays a and b:
N,D,V,W = 2,3,4,5
a = np.random.randint(0,V,N*D).reshape(N,D)
a
array([[2, 3, 3],
[2, 0, 3]])
b = np.random.randint(0,10,V*W).reshape(V,W)
b
array([[0, 1, 0, 5, 5],
[0, 3, 6, 8, 7],
[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1]])
What I need to do is to replace every element of array a with a row from array b using the array a element value as the row index of array b.
At the moment I'm doing it this way which works fine:
b[a.ravel(),:].reshape(*a.shape,-1)
array([[[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1],
[4, 6, 3, 3, 1]],
[[8, 8, 9, 0, 9],
[0, 1, 0, 5, 5],
[4, 6, 3, 3, 1]]])
However it seems this approach is a bit slow.
I tested it with:
N,D,V,W = 20000,64,100,256
and it took an average of 674ms on my laptop(8 core, 16 ram)
Can someone please recommend an faster yet still simple approach?

Convert string to integer pandas dataframe index

I have a pandas dataframe with a multiindex. Unfortunately one of the indices gives years as a string
e.g. '2010', '2011'
how do I convert these to integers?
More concretely
MultiIndex(levels=[[u'2010', u'2011'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, , ...]], names=[u'Year', u'Month'])
.
df_cbs_prelim_total.index.set_levels(df_cbs_prelim_total.index.get_level_values(0).astype('int'))
seems to do it, but not inplace. Any proper way of changing them?
Cheers,
Mike
Will probably be cleaner to do this before you assign it as index (as #EdChum points out), but when you already have it as index, you can indeed use set_levels to alter one of the labels of a level of your multi-index. A bit cleaner as your code (you can use index.levels[..]):
In [165]: idx = pd.MultiIndex.from_product([[1,2,3], ['2011','2012','2013']])
In [166]: idx
Out[166]:
MultiIndex(levels=[[1, 2, 3], [u'2011', u'2012', u'2013']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [167]: idx.levels[1]
Out[167]: Index([u'2011', u'2012', u'2013'], dtype='object')
In [168]: idx = idx.set_levels(idx.levels[1].astype(int), level=1)
In [169]: idx
Out[169]:
MultiIndex(levels=[[1, 2, 3], [2011, 2012, 2013]],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
You have to reassign it to save the changes (as is done above, in your case this would be df_cbs_prelim_total.index = df_cbs_prelim_total.index.set_levels(...))

Select elements of a numpy array based on the elements of a second array

Consider a numpy array A of shape (7,6)
A = array([[0, 1, 2, 3, 5, 8],
[4, 100, 6, 7, 8, 7],
[8, 9, 10, 11, 5, 4],
[12, 13, 14, 15, 1, 2],
[1, 3, 5, 6, 4, 8],
[12, 23, 12, 24, 4, 3],
[1, 3, 5, 7, 89, 0]])
together with a second numpy array r of the same shape which contains the radius of A starting from a central point A(3,2)=0:
r = array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
I would like to pick up all the elements of A which are located at the position 1 of r, i.e. [9,10,11,15,4,6,5,13], all the elements of A located at position 2 of r and so on. I there some numpy function to do that?
Thank you
You can select a section of A by doing something like A[r == 1], to get all the sections as a list you could do [A[r == i] for i in range(r.max() + 1)]. This will work, but may be inefficient depending on how big the values in r go because you need to compute r == i for every i.
You could also use this trick, first sort A based on r, then simply split the sorted A array at the right places. That looks something like this:
r_flat = r.ravel()
order = r_flat.argsort()
A_sorted = A.ravel()[order]
r_sorted = r_flat[order]
edges = r_sorted.searchsorted(np.arange(r_sorted[-1] + 1), 'right')
sections = []
start = 0
for end in edges:
sections.append(A_sorted[start:end])
start = end
I get a different answer to the one you were expecting (3 not 4 from the 4th row) and the order is slightly different (strictly row then column), but:
>>> A
array([[ 0, 1, 2, 3, 5, 8],
[ 4, 100, 6, 7, 8, 7],
[ 8, 9, 10, 11, 5, 4],
[ 12, 13, 14, 15, 1, 2],
[ 1, 3, 5, 6, 4, 8],
[ 12, 23, 12, 24, 4, 3],
[ 1, 3, 5, 7, 89, 0]])
>>> r
array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
>>> A[r==1]
array([ 9, 10, 11, 13, 15, 3, 5, 6])
Alternatively, you can get column then row ordering by transposing both arrays:
>>> A.T[r.T==1]
array([ 9, 13, 3, 10, 5, 11, 15, 6])

numpy custom array element retrieval

I have a question regarding how to extract certain values from a 2D numpy array
Foo =
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
Bar =
array([[0, 0, 1],
[1, 2, 3]])
I want to extract elements from Foo using the values of Bar as indices, such that I end up with an 2D matrix/array Baz of the same shape as Bar. The ith column in Baz correspond is Foo[(np.array(each j in Bar[:,i]),np.array(i,i,i,i ...))]
Baz =
array([[ 1, 2, 6],
[ 4, 8, 12]])
I could do a couple nested for-loops but I was wondering if there is a more elegant, numpy-ish way to do this.
Sorry if this is a bit convoluted. Let me know if I need to explain further.
Thanks!
You can use Bar as the row index and an array [0, 1, 2] as the column index:
# for easy copy-pasting
import numpy as np
Foo = np.array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
Bar = np.array([[0, 0, 1], [1, 2, 3]])
# now use Bar as the `i` coordinate and 0, 1, 2 as the `j` coordinate:
Foo[Bar, [0, 1, 2]]
# array([[ 1, 2, 6],
# [ 4, 8, 12]])
# OR, to automatically generate the [0, 1, 2]
Foo[Bar, xrange(Bar.shape[1])]