Determine number of preceding equal elements - numpy

Using numpy, given a sorted 1D array, how to efficiently obtain a 1D array with equal size where the value at each position is the number of preceding equal elements? I have very large arrays and processing each element in Python code one way or another is not acceptable.
Example:
input = [0, 0, 4, 4, 4, 5, 5, 5, 5, 6]
output = [0, 1, 0, 1, 2, 0, 1, 2, 3, 0]

import numpy as np
A=np.array([0, 0, 4, 4, 4, 5, 5, 5, 5, 6])
uni,counts=np.unique(A, return_counts=True)
out=np.concatenate([np.arange(n) for n in counts])
print(out)
Not certain about the efficiency (probably better way to form the out array rather than concatenating), but a very straightforward way to get the result you are looking for. Counts the unique elements, then does np.arange on each count to get the ascending sequence, then concatenates these arrays together.

Related

Numpy Interpolation for Array of Arrays

I have an array of arrays that I want to interpolate based on each array's min and max.
For a simple mxn array , with values ranging from 0 to 1, I can do this as follows:
x_inp=np.interp(x,(x.min(),x.max()),(0,0.7))
This suppresses every existing value to 0 to 0.7. However, if I have an array of dimension 100xmxn, the above method considers the global min/max and not the individual min/max of each of the mxn array.
Edit:
For example
x1=np.random.randint(0,5, size=(2, 4))
x2=np.random.randint(6,10, size=(2, 4))
my_list=[x1,x2]
my_array=np.asarray(my_list)
print(my_array)
>> array([[[1, 4, 3, 4],
[3, 2, 0, 0]],
[9, 6, 8, 6],
8, 7, 6, 7]]])
my_array is now of dimension 2x2x4 and my_array.min() and my_array.max() would give me 0 and 9. So If I interpolate, it won't work based on the min/max of the individual 2x4 arrays. What I want is, to have the interpolation work based on min/max of 0/4 for the 1st array and 6/9 for the second.

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.
An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

numpy find values of maxima pointed to by argmax [duplicate]

This question already has answers here:
Index n dimensional array with (n-1) d array
(3 answers)
Closed 4 years ago.
I have a 3-d array. I find the indexes of the maxima along an axis using argmax. How do I now use these indexes to obtain the maximal values?
2nd part: How to do this for arrays of N-d?
Eg:
u = np.arange(12).reshape(3,4,1)
In [125]: e = u.argmax(axis=2)
Out[130]: e
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
It would be nice if u[e] produced the expected results, but it doesn't work.
The return value of argmax along an axis can't be simply used as an index. It only works in a 1d case.
In [124]: u = np.arange(12).reshape(3,4,1)
In [125]: e = u.argmax(axis=2)
In [126]: u.shape
Out[126]: (3, 4, 1)
In [127]: e.shape
Out[127]: (3, 4)
e is (3,4), but its values only index the last dimension of u.
In [128]: u[e].shape
Out[128]: (3, 4, 4, 1)
Instead we have to construct indices for the other 2 dimensions, ones which broadcast with e. For example:
In [129]: I,J=np.ix_(range(3),range(4))
In [130]: I
Out[130]:
array([[0],
[1],
[2]])
In [131]: J
Out[131]: array([[0, 1, 2, 3]])
Those are (3,1) and (1,4). Those are compatible with (3,4) e and the desired output
In [132]: u[I,J,e]
Out[132]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
This kind of question has been asked before, so probably should be marked as a duplicate. The fact that your last dimension is size 1, and hence e is all 0s, distracting readers from the underlying issue (using a multidimensional argmax as index).
numpy: how to get a max from an argmax result
Get indices of numpy.argmax elements over an axis
Assuming you've taken the argmax on the last dimension
In [156]: ij = np.indices(u.shape[:-1])
In [157]: u[(*ij,e)]
Out[157]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
or:
ij = np.ix_(*[range(i) for i in u.shape[:-1]])
If the axis is in the middle, it'll take a bit more tuple fiddling to arrange the ij elements and e.
so for general N-d array
dims = np.ix_(*[range(x) for x in u.shape[:-1]])
u.__getitem__((*dims,e))
You can't write u[*dims,e], that's a syntax error, so I think you must use getitem directly.

Slice pandas.DataFrame's second Multiindex

I have a pandas Dataframe of the form
"a" "b" "c" #first level index
0, 1, 2 0, 1, 2 0, 1, 2 #second level index
index
0 1,2,3 6,7,8 5,3,4
1 2,3,4 7,5,4 9,2,5
2 3,4,5 4,5,6 0,4,5
...
representing a spot (a,b or c) where a measurement took place and the results of the measurments (0,1,2) that took place on this spot.
I want to do the following:
pick a slice in the sample (say the first measurement on each spot at measurement 0)
mean each i-th measurement (mean("a"[0], "b"[0], "c"[0]), mean("a"[1], "b"[1], "c"[1]), ...)
I tried to get the hang of the pandas Multiindex documentation but do not manage to slice for the second level.
This is the column index:
MultiIndex(levels=[['a', 'b', 'c', ... , 'y'], [0, 1, 2, ... , 49]],
labels=[[0, 0, 0, ... , 0, 1, 1, 1, ... 1, ..., 49, 49, 49, ... 49]])
And the index
Float64Index([204.477752686, 204.484664917, 204.491577148, ..., 868.723022461], dtype='float64', name='wavelength', length=43274)
Using
df[:][0]
yields a key-error (0 not in index)
df.iloc[0]
returns the horizontal slice
0 "a":(1,2,3), "b":(6,7,8), "c":(5,3,4)
but I would like to have
"a":(1,2,3), "b":(6,7,4), "c":(5,9,0)
THX for any help
PS: version:pandas-0.19, python-3.4
The trick was to specify the axis...
df.loc(axis=1)[:,0]
provides the 0-th measurment of each spot.
Since I use integers on the second level index, I am not sure if this actually yields the label "0" or just the 0-th measurment in the DataFrame, label-agnostic.
But for my use-case, this is actually sufficient.

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html