Using numpy.argsort() gives wrong indices array - numpy

I'm new to numpy, so I might be missing something obviuous here.
The following small argsort() test script gives strange results. Any directions ?
import numpy as np
a = np.array([[3, 5, 6, 4, 1] , [2, 7 ,4 ,1 , 2] , [8, 6, 7, 2, 1]])
print a
print a.argsort(axis=0)
print a.argsort(axis=1)
output:
[[3 5 6 4 1]
[2 7 4 1 2]
[8 6 7 2 1]]
[[1 0 1 1 0] # bad 4th & 5th columns ?
[0 2 0 2 2]
[2 1 2 0 1]]
[[4 0 3 1 2] # what's going on here ?
[3 0 4 2 1]
[4 3 1 2 0]]

As others have indicated this method is working correctly, so in order to provide an answer here is an explanation of how .argsort() works. a.argsort returns the indices (not values) in order that would sort the array along the specified axis.
In your example
a = np.array([[3, 5, 6, 4, 1] , [2, 7 ,4 ,1 , 2] , [8, 6, 7, 2, 1]])
print a
print a.argsort(axis=0)
returns
[[3 5 6 4 1]
[2 7 4 1 2]
[8 6 7 2 1]]
[[1 0 1 1 0]
[0 2 0 2 2]
[2 1 2 0 1]]
because along
[[3 ...
[2 ...
[8 ...
2 is the smallest value. Therefore the current index of 2 (which is 0) takes the first position along this axis in the matrix returned by argsort(). The second smallest value is 3 at index 0, therefore the second position along this axis in the returned matrix will be 0. Finally, the largest element is 2 which occurs at index 2 along the 0 axis, so the final element of the returned matrix will be 2. Thus:
[[1 ...
[0 ...
[2 ...
the same process is repeated along other 4 sequences along axis 0:
[[...5 ...] [[...0 ...]
[...7 ...] becomes ----> [... 2 ...]
[...6 ...]] [... 1 ...]]
[[...6 ...] [[...1 ...]
[...4 ...] becomes ----> [... 0 ...]
[...7 ...]] [... 2 ...]]
[[...4 ...] [[...1 ...]
[...1 ...] becomes ----> [... 2 ...]
[...2 ...]] [... 0 ...]]
[[...1] [[...0]
[...2] becomes ----> [... 2]
[...1]] [... 1]]
changing the axis to from 0 to 1, results in this same process being applied along sequences in the 1st axis:
[[3 5 6 4 1 becomes ----> [[4 0 3 1 2
again because the smallest element is 1 which is at index 4, then 3 at index 0, then 4 at index 3, 5 at index 1 and finally 6 is the largest at index 2.
As before this process is repeated across each of
[2 7 4 1 2] ----> [3 0 4 2 1]
[8 6 7 2 1] ----> [4 3 1 2 0]
giving
[[4 0 3 1 2]
[3 0 4 2 1]
[4 3 1 2 0]]

This actually returns a sorted array, whose elements, rather than the element of the array we want to sort, are the index of that element.
enter image description here
enter image description here
This says the first element in our sorted array would be the element whose index is '1', which in turn is '0'.

Related

Permuting entire rows in a 2d numpy array

Consider numpy array arr , shown below:
arr = ([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
I want to find all row permutations of arr. NOTE: the order of elements in any given row is unchanged. It is the entire rows that are being permuted.
Because arr has 5 rows, there will be 5! = 120 permutations. I’m hoping these could be ‘stacked’ into a 3d array p, having shape (120, 5, 6):
p = [[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]],
[[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[1, 2, 3, 4, 5, 6]
[4, 8, 4, 8, 4, 8]],
… etc …
[[1, 2, 3, 4, 5, 6],
[4, 8, 4, 8, 4, 8],
[0, 1, 0, 1, 0, 1],
[2, 2, 2, 2, 2, 2],
[1, 5, 6, 3, 3, 7]]]
There is a lot of material online about permitting elements within rows, but I need help in permuting the entire rows themselves.
You can make use of itertools.permutations and np.argsort:
from itertools import permutations
out = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
print(out)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[4 8 4 8 4 8]
[1 2 3 4 5 6]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]
Similar answer, but you do not need to .argsort one more time
from itertools import permutations
import numpy as np
arr = np.array([[1, 5, 6, 3, 3, 7],
[2, 2, 2, 2, 2, 2],
[0, 1, 0, 1, 0, 1],
[4, 8, 4, 8, 4, 8],
[1, 2, 3, 4, 5, 6]])
output = np.array([arr[i, :] for i in permutations(range(5))])
print(output)
[[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[4 8 4 8 4 8]
[1 2 3 4 5 6]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 2 3 4 5 6]
[4 8 4 8 4 8]]
[[1 5 6 3 3 7]
[2 2 2 2 2 2]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 2 3 4 5 6]]
...
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[2 2 2 2 2 2]
[0 1 0 1 0 1]
[1 5 6 3 3 7]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[1 5 6 3 3 7]
[2 2 2 2 2 2]]
[[1 2 3 4 5 6]
[4 8 4 8 4 8]
[0 1 0 1 0 1]
[2 2 2 2 2 2]
[1 5 6 3 3 7]]]
This is a bit faster, here are speed comparisons:
%%timeit
output = np.array([arr[i, :] for i in permutations(range(5))])
381 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
output = np.array([arr[np.argsort(idx)] for idx in permutations(range(5))])
863 µs ± 97.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Pandas grouping based on a column in a very specific format

I have a data-frame df -
a b c
0 1 5 0
1 1 6 1
2 1 7 0
3 3 8 0
need to group it based on column-c like -
a b c
0 [1, 1] [5, 6] [0, 1]
1 1 7 0
2 3 8 0
It can be done through iterating over the df. Are there any other ways more like pandas grouping or something?
Not sure but do you need this?
k = 0
temp = []
for i in df.c:
if i == 0:
k+=1
temp.append(k)
df = df.groupby(temp).agg(list)
Output:
a b c
1 [1, 1] [5, 6] [0, 1]
2 [1] [7] [0]
3 [3] [8] [0]
You don't need any loop. Here is the two line of solution.
Change index 0 to 1 so that you can make groups on the basis of index.
Make groups on the basis of index using groupby and get list of values of each column for each group
df.rename(index={0:1}, inplace=True)
df = df.groupby(df.index).agg(list)
print(df)
a b c
1 [1, 1] [5, 6] [0, 1]
2 [1] [7] [0]
3 [3] [8] [0]

Convert an array of peaks to a series of steps that represent to most recent peak value

Given an array of peaks like this:
peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0]
How can I create an array of steps that indicate the most recent peak value, like this:
steps = [0, 5, 5, 3, 3, 3, 3, 7, 7]
Requirements:
This will be used for image analysis on large 3D images (1000**3) so needs to be fast, meaning no for loops or list comprehensions...only numpy vectorization.
The example I gave above is a linear list, but this needs to apply equally well to ND images. This means doing the operation along a single axis, but allowing for multiple axes as the same time.
Note
I recently asked a question that turned out to be a dupe (easily solved with scipy.maximum.accumulate), but my question also contained an optional 'would be nice if' twist, as outlined above. It turns out that I actually have a need for this second behavior as well, so I'm re-posting just this part.
Here is a solution that handles ND and can detect "broad peaks" like ..., 0, 4, 4, 4, 3, ... but not ..., 0, 4, 4, 4, 7, ....
import numpy as np
import operator as op
def keep_peaks(A, axis=-1):
B = np.swapaxes(A, axis, -1)
# take differences between consecutive elements along axis
# pad with -1 at the start and the end
# the most efficient way is to allocate first, because otherwise
# padding would involve reallocation and a copy
# note that in order to avoid that copy we use np.subtract and its
# out kwd
updown = np.empty((*B.shape[:-1], B.shape[-1]+1), B.dtype)
updown[..., 0], updown[..., -1] = -1, -1
np.subtract(B[..., 1:], B[..., :-1], out=updown[..., 1:-1])
# extract indices where the there is a change along axis
chnidx = np.where(updown)
# get the values of the changes
chng = updown[chnidx]
# find indices of indices 1) where we go up and 2) the next change is
# down (note how the padded -1's at the end are useful here)
# also include the beginning of each 1D subarray
pkidx, = np.where((chng[:-1] > 0) & (chng[1:] < 0) | (chnidx[-1][:-1] == 0))
# use indices of indices to retain only peak indices
pkidx = (*map(op.itemgetter(pkidx), chnidx),)
# construct array of changes of the result along axis
# these will be zero everywhere
out = np.zeros_like(A)
aux = out.swapaxes(axis, -1)
# except where there is a new peak
# at these positions we need to put the differences of peak levels
aux[(*map(op.itemgetter(slice(1, None)), pkidx),)] = np.diff(B[pkidx])
# we could ravel the array and do the cumsum on that, but raveling
# a potentially noncontiguous array is expensive
# instead we keep the shape, at the cost of having to replace the
# value at the beginning of each 2D subarray (we do not need the
# "line-jump" difference but the plain 1st value there)
aux[..., 0] = B[..., 0]
# finally, use cumsum to go from differences to plain values
return out.cumsum(axis=axis)
peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0]
print(peaks)
print(keep_peaks(peaks))
# show off axis kwd and broad peak detection
peaks3d = np.kron(np.random.randint(0, 10, (3, 6, 3)), np.ones((1, 2, 1), int))
print(peaks3d.swapaxes(1, 2))
print(keep_peaks(peaks3d, 1).swapaxes(1, 2))
Sample run:
[0, 5, 0, 3, 2, 0, 1, 7, 0]
[0 5 5 3 3 3 3 7 7]
[[[5 5 3 3 1 1 4 4 9 9 7 7]
[2 2 9 9 3 3 4 4 3 3 7 7]
[9 9 0 0 2 2 5 5 7 7 9 9]]
[[1 1 3 3 9 9 3 3 7 7 0 0]
[1 1 1 1 4 4 5 5 0 0 3 3]
[5 5 5 5 8 8 1 1 2 2 7 7]]
[[6 6 3 3 8 8 2 2 3 3 2 2]
[6 6 9 9 3 3 9 9 3 3 9 9]
[1 1 5 5 7 7 2 2 7 7 1 1]]]
[[[5 5 5 5 5 5 5 5 9 9 9 9]
[2 2 9 9 9 9 4 4 4 4 7 7]
[9 9 9 9 9 9 9 9 9 9 9 9]]
[[1 1 1 1 9 9 9 9 7 7 7 7]
[1 1 1 1 1 1 5 5 5 5 3 3]
[5 5 5 5 8 8 8 8 8 8 7 7]]
[[6 6 6 6 8 8 8 8 3 3 3 3]
[6 6 9 9 9 9 9 9 9 9 9 9]
[1 1 1 1 7 7 7 7 7 7 7 7]]]

ValueError: operands could not be broadcast together with shapes in concatenatinng arrays across pandas columns

I am working with a pandas dataframe that something looks like this:
col1 col2 col3 col_num
0 [-0.20447069290738076, 0.4159556680196389, -0.... [-0.10935000772973974, -0.04425263358067333, -... [51.0834196, 10.4234469] 3160
1 [-0.42439951483476124, -0.3135960467759942, 0.... [0.3842614765721414, -0.06756644506033657, 0.4... [45.5643442, 17.0118954] 3159
3 [0.3158755226012898, -0.007057682056994253, 0.... [-0.33158941456615376, 0.09637640660002277, -0... [50.6402809, 4.6667145] 3157
5 [-0.011089723491692679, -0.01649481399305317, ... [-0.02827408211098023, 0.00019040943944721592,... [53.45733965, -2.22695880505223] 3157
I would like to concatenate vectors across rows as so:
df['col1'] + df['col2'] + df['col3'] + df['col_num'].transform(lambda item: [item])
However I am prompted with the following error:
/opt/conda/lib/python3.6/site-packages/pandas/core/ops.py in <lambda>(x)
708 if is_object_dtype(lvalues):
709 return libalgos.arrmap_object(lvalues,
--> 710 lambda x: op(x, rvalues))
711 raise
712
ValueError: operands could not be broadcast together with shapes (30,) (86597,)
It's looking like for some reason ti's getting stuck at concatenating the 3rd column, which only has 2 dimensions. The data is 86597 rows long. How can I fix this error?
You can convert problematic column to list like:
df['col1'] + df['col2'] + df['col3'].apply(list) + df['col_num'].transform(lambda x: [x])
Another solution is convert all lists to 2d numpy arrays and use hstack, if same length of lists in each column, because you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks:
np.random.seed(123)
N = 10
df = pd.DataFrame({
"col1": [np.random.randint(10, size=3) for i in range(N)],
"col2": [np.random.randint(10, size=3) for i in range(N)],
"col3": [np.random.randint(10, size=2) for i in range(N)],
'col_num': range(N)
})
print (df)
col1 col2 col3 col_num
0 [2, 2, 6] [9, 3, 4] [2, 4] 0
1 [1, 3, 9] [6, 1, 5] [8, 1] 1
2 [6, 1, 0] [6, 2, 1] [2, 1] 2
3 [1, 9, 0] [8, 3, 5] [1, 3] 3
4 [0, 9, 3] [0, 2, 6] [5, 9] 4
5 [4, 0, 0] [2, 4, 4] [0, 8] 5
6 [4, 1, 7] [6, 3, 0] [1, 6] 6
7 [3, 2, 4] [6, 4, 7] [3, 3] 7
8 [7, 2, 4] [6, 7, 1] [5, 9] 8
9 [8, 0, 7] [5, 7, 9] [7, 9] 9
a = np.array(df['col1'].values.tolist())
b = np.array(df['col2'].values.tolist())
c = np.array(df['col3'].values.tolist())
#create Nx1 array
d = df['col_num'].values[:, None]
arr = np.hstack((a,b,c, d))
print (arr)
[[2 2 6 9 3 4 2 4 0]
[1 3 9 6 1 5 8 1 1]
[6 1 0 6 2 1 2 1 2]
[1 9 0 8 3 5 1 3 3]
[0 9 3 0 2 6 5 9 4]
[4 0 0 2 4 4 0 8 5]
[4 1 7 6 3 0 1 6 6]
[3 2 4 6 4 7 3 3 7]
[7 2 4 6 7 1 5 9 8]
[8 0 7 5 7 9 7 9 9]]
df = pd.DataFrame(arr)
print (df)
0 1 2 3 4 5 6 7 8
0 2 2 6 9 3 4 2 4 0
1 1 3 9 6 1 5 8 1 1
2 6 1 0 6 2 1 2 1 2
3 1 9 0 8 3 5 1 3 3
4 0 9 3 0 2 6 5 9 4
5 4 0 0 2 4 4 0 8 5
6 4 1 7 6 3 0 1 6 6
7 3 2 4 6 4 7 3 3 7
8 7 2 4 6 7 1 5 9 8
9 8 0 7 5 7 9 7 9 9

Change axis of matrix - Python (concatenate)

I would like concatenate two matricies with different size
[[1 1 1]
[2 3 2]
[5 5 3]
[3 2 5]
[4 4 4]]
[[1 3 2 5 4]
[1 2 5 3 4]]
to have this matrix
[[1 1 1 1 1]
[2 3 2 3 2]
[5 5 3 2 5]
[3 2 5 5 3]
[4 4 4 4 4]]
I know size of matricies are differents ((5, 3), 'VS', (2, 5))
but i want concatenate these matricies between them.
It's possible to change axis of second matrix ?
Thanks !