Numpy compare values inside to return greater index - numpy

I have a numpy array and another array:
[array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
Which index position inside the numpy arrays wins - i.e. -1.67397643 > -2.77258872 - so the first value would be 0.
Final output of the numpy array would be [0, 0, 1, 1] (a list is fine too)
How can I do that ?

It seems you have a list of arrays, so I would start by making them a proper numpy array:
a = [array([-1.67397643, -2.77258872]), array([-1.67397643, -2.77258872]), array([-2.77258872, -1.67397643]), array([-2.77258872, -1.67397643])]
b = np.array(a).T # .T transposes it.
c = b[0] < b[1]
c is now an array([False, False, True, True], dtype=bool), and probably serves your purpose. If you must have [0,0,1,1] instead, then:
d = np.zeros(len(c))
d[c] = 1
d is now an array([ 0., 0., 1., 1.])

Related

Equivalent of np.isin for TensorFlow

I have categories as a list of list integers as shown below:
categories = [
[0,2,4,6,8],
[1,3,5,7,9]
]
I have a label tensor y with num_batches integers (as classes):
y = tf.constant([0, 1, 1, 2, 5, 4, 7, 9, 3, 3])
I want to replace values in y with certain indices (let's say 0-even, 1-odd) with the categories list available, such that final result would be:
cat_labels = tf.constant([0, 1, 1, 0, 1, 0, 1, 1, 1, 1])
I can get it by iterating through each value in y like below:
cat_labels = tf.Variable(tf.identity(y))
for idx in range(len(categories)):
for i, _y in enumerate(y):
if _y in categories[idx]: # if _y value is in categories[idx]
cat_labels[i].assign(idx) # replace all of them with idx
But apparently iterating is not allowed when this block is encapsulated in a #tf.function parent function.
Is there a way to apply the logic without iterating, or converting to numpy and applying np.isin, while getting speedups of tf.function?
Edit: There seem to be workarounds on this like here, but any help on explaining in the context of this use case would be appreciated.
You can try this:
y = tf.constant([0., 1., 1., 2., 5., 4., 7., 9., 3., 3.], dtype=tf.float32)
categories = [[0,2,4,6,8],[1,3,5,7,9]]
c = tf.convert_to_tensor(categories, dtype=tf.float32)
cat_labels = tf.map_fn( # apply an operation on all of the elements of Y
lambda x:tf.gather_nd( # get index of category: 0 or 1 or anything else
tf.cast( # cast dtype of the result of the inner function
tf.where( # get index of the element of Y in categories
tf.equal(c, x)), # search an element of Y within categories
dtype=tf.float32),[0,0]), y)
tf.print(cat_labels, summarize=-1)
# [0 1 1 0 1 0 1 1 1 1]

numpy array of array with custom filtering

I am trying to filter a numpy array of array with given conditions, for example
input = np.array([[1,2,3],[4,5,6],[4,5,6],[0,9,19]])
output where the [0] >= 4, [1] >= 5, [2] >= 6
expected result = np.array([[4,5,6],[4,5,6]])
what would be the best way to achieve this with performance concern?
extended question: and how to retrieve the correspondance index of the each output elements in the input array?
You can do:
a = np.array([[1,2,3],[4,5,6],[4,5,6],[0,9,19]])
a[(a[:,0] >=4) & (a[:,1] >= 5) & (a[:,2] >=6)]
Here you create binary masks for the conditions on each elements in each row of the data, use the logical and to combine them, and finally use the resulting mask to get the matching data rows.
To find the index of the data rows matching the conditions, you can use numpys where() function:
idx = np.where((a[:,0] >=4) & (a[:,1] >= 1) & (a[:,2] >=6))[0]
As per your request, a numba version
import numpy as np
import numba as nb
import sys
import timeit
target = np.random.randint(low=-100000, high=100000, size=(int(sys.argv[2]), 3), dtype=np.int)
comp = np.array([4, 5, 6])
#nb.njit((nb.int64[:, :], nb.int64[::3]), parallel=True)
def cmp(a, b):
c = np.empty((a.shape[0],), dtype=a.dtype)
for i in nb.prange(a.shape[0]):
c[i] = a[i][0] > b[0] and a[i][1] > b[1] and a[i][2] > b[2]
return c
def cmp_normal(a, b):
# return np.all(a > b, axis=1)
return (a[:,0] >=b[0]) & (a[:,1] >= b[1]) & (a[:,2] >=b[2])
print(timeit.timeit(lambda: eval(sys.argv[1])(target, comp), number=10))
First output time is for sequential numba, second one is for parallel numba.
Parallel numba gives 5 times speed up compared to sequential
(base) xxx#xxx:~$ python test.py cmp 1000000
6.40756068899982
(base) xxx#xxx:~$ python test.py cmp 1000000
1.3425709140001345
Now vanilla numpy
(base) xxx#xxx:~$ python test.py cmp_normal 1000000
4.04174472700015
Numba parallel is fastest. But if you try to return a[c] instead, numba will slow down. So it depends on what you write
In [223]: arr =np.array([[1,2,3],[4,5,6],[4,5,6],[0,9,19]])
In [224]: arr
Out[224]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 4, 5, 6],
[ 0, 9, 19]])
Since you are testing values, one for each column, you can do a simple numpy == test (the (3,) test broadcasts with the (4,3) arr)
In [225]: arr==[4,5,6]
Out[225]:
array([[False, False, False],
[ True, True, True],
[ True, True, True],
[False, False, False]])
and where a whole row is true:
In [226]: (arr==[4,5,6]).all(axis=1)
Out[226]: array([False, True, True, False])
This can be applied as a boolean mask to select those rows from arr:
In [227]: arr[_]
Out[227]:
array([[4, 5, 6],
[4, 5, 6]])
and the numeric indices:
In [228]: np.nonzero(__)
Out[228]: (array([1, 2]),)

Get column-wise maximums from a NumPy array

I have a 2D array, say
x = np.random.rand(10, 3)
array([[ 0.51158246, 0.51214272, 0.1107923 ],
[ 0.5210391 , 0.85308284, 0.63227215],
[ 0.57239625, 0.06276943, 0.1069803 ],
[ 0.71627613, 0.66454443, 0.56771438],
[ 0.24595493, 0.01007568, 0.84959605],
[ 0.99158904, 0.25034553, 0.00144037],
[ 0.43292656, 0.9247424 , 0.5123086 ],
[ 0.07224077, 0.57230282, 0.88522979],
[ 0.55665913, 0.20119776, 0.58865823],
[ 0.55129624, 0.26226446, 0.63070611]])
Then I find the indexes of maximum elements along the columns:
indexes = np.argmax(x, axis=0)
array([5, 6, 7])
So far so good.
But how do I actually get those elements? That is, how do I get ?some_operation?(x, indexes) == [0.99158904, 0.9247424, 0.88522979]?
Note that I need both the indexes and the associated values.
The best I could come up with was x[indexes, range(x.shape[1])], but it looks kinda complicated and inefficient. Is there a more idiomatic way?
You can use np.amax to find max value along an axis.
Using your example (x is the original array in your post):
In[1]: np.argmax(x, axis=0)
Out[1]:
array([5, 6, 7], dtype=int64)
In[2]: np.amax(x, axis=0)
Out[2]:
array([ 0.99158904, 0.9247424 , 0.88522979])
Documentation link

numpy: copy some elements of two arrays into another array

I have two arrays and I am hoping to create an additional array which will copy the some values in the two arrays:
a = np.array([1,-2,-3,-3])
b = np.array([-2,1,-3,-2])
Hoping to get:
np.array([1,1,-3,-2])
I'm just trying to get the value 1 from both arrays into another array. The copying of the negative numbers doesn't matter as they get masked down the road.
Thanks #shridhar-r-kulkarni for asking for more detail rather than simply down voting. It jogged my thinking so I could work it out.
a = np.array([1,-2,-3,-3])
b = np.array([-2,1,-3,-2])
c= np.full_like(a, np.nan, dtype=np.double)
# Find which indices in a has values > 0
c[np.where(a > 0)] = a[np.where(a > 0)]
# Find which indices in b has values > 0
c[np.where(b > 0)] = b[np.where(b > 0)]
# c is array([ 1., 1., nan, nan])

Removing all but last non-zero sequence from numpy array

The problem
I have a 1-dimensional numpy array filled mostly with zeros but also containing some groups of non-zero values.
>> import numpy as np
>> a = np.zeros(10)
>> a[2:4] = 2
>> a[6:9] = 3
>> print a
[ 0. 0. 2. 2. 0. 0. 3. 3. 3. 0.]
I want to get the array that contains only the last non-zero group. In other words, all but the last non-zero group should be replaced by zeros. (The groups could be only 1 element long). Like so:
[ 0. 0. 0. 0. 0. 0. 3. 3. 3. 0.]
Non-robust solution
This seems to do the trick. Reverse the array and find the first index where the change between elements is negative. Then replace all subsequent elements with zero. Then flip back. It's a bit long-winded:
>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 0. 0. 0. 0. 0. 0. 3. 3. 3. 0.]
Fails for a specific case
However, it is not robust and fails in the following case (because the where command returns an empty list of indices):
>> a = np.zeros(10)
>> a[0:4] = 2
>> print a
[ 2. 2. 2. 2. 0. 0. 0. 0. 0. 0.]
>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
Traceback (most recent call last):
File "<ipython-input-81-8cba57558ba8>", line 1, in <module>
runfile('C:/Users/name/test1.py', wdir='C:/Users/name')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/name/test1.py", line 21, in <module>
b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
IndexError: index 0 is out of bounds for axis 0 with size 0
Fix
So I need to introduce an if clause:
>> b = a[::-1]
>> if len(np.where(np.ediff1d(b) < 0)[0]) > 0:
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 2. 2. 2. 2. 0. 0. 0. 0. 0. 0.]
Is there a more elegant way to do it?
UPDATE
Following on from Divakar's excellent answer and mtrw's question, I would like to extend the specification. The method should also work if the input array has non-zero values that are negative and for groups of non-zero numbers that change within the grouping.
e.g. np.array([1, 0, 0, 4, 5, 4, 5, 0, 0])
This means methods where we check for a positive or negative difference between elements, in order to find the group boundaries, would not work so well.
Approach #1
Since we are after elegance, let's feed ourselves a one-liner -
a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0
Sample run -
In [605]: a
Out[605]: array([ 0., 0., 2., 2., 0., 0., 3., 3., 3., 0.])
In [606]: a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0
In [607]: a
Out[607]: array([ 0., 0., 0., 0., 0., 0., 3., 3., 3., 0.])
Approach #2
Above approach assumes that the last group numbers are greater than 0's. If that's not the case and for cases where the non-zeros group might have different numbers, let's feed one more line to have a generic solution -
mask = a != 0
a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0
Sample run -
In [667]: a
Out[667]: array([-1, 0, 0, -4, -5, 4, -5, 0, 0])
In [668]: mask = a != 0
In [669]: a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0
In [670]: a
Out[670]: array([ 0, 0, 0, -4, -5, 4, -5, 0, 0])