How can I get the row of the first True find in a numpy matrix? - numpy

I have the following matrix defined:
d = np.array(
[[False, False, False, False, False, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, True, True, True, True],
[False, False, False, True, True, True],
[False, False, False, False, True, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True]])
And I would like to get a vector of length 6 containing the index of the first True occurrence in each column.
So the expected output would be:
fo = np.array([10, 8, 6, 4, 2, 0])
If there would be no True values in a given column ideally it shall return NaN for that column.
I have tried:
np.sum(d, axis=0)
array([ 4, 8, 12, 16, 20, 23])
which together with the length of the columns would give the index, but that would work only if there would be only two continuous regions, one with False and another one with True.

You can do this using argmax which find the first true, and then find columns which all is False to cure the result as needed for columns contain only False. e.g. if the first column all is False:
# if first column be all False, so it show 0, too; which need additional work using mask
ini = np.argmax(d == 1, 0) # [0 8 6 4 2 0] # if we want to fill with nans so convert it to object using ".astype(object)"
sec = (d == 0).all(0) # find column with all False
ini[sec] = 1000
# [1000 8 6 4 2 0]

First, we can iterate through the Numpy array. Then, we can check if True is in the nested array we are looking at. If so, we use .index() to find what the index is.
index_list = []
for nested_list in d:
if True in nested_list:
index_list.append(nested_list.index(True))

Related

Repeat elements from one array based on another

Given
a = np.array([1,2,3,4,5,6,7,8])
b = np.array(['a','b','c','d','e','f','g','h'])
c = np.array([1,1,1,4,4,4,8,8])
where a & b 'correspond' to each other, how can I use c to slice b to get d which 'corresponds' to c:
d = np.array(['a','a','a','d','d','d','h','h')]
I know how to do this by looping
for n in range(a.shape[0]):
d[n] = b[np.argmax(a==c[n])]
but want to know if I can do this without loops.
Thanks in advance!
With the a that is just position+1 you can simply use
In [33]: b[c - 1]
Out[33]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')
I'm tempted to leave it at that, since the a example isn't enough to distinguish it from the argmax approach.
But we can test all a against all c with:
In [36]: a[:,None]==c
Out[36]:
array([[ True, True, True, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, True, True, True, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, True, True]])
In [37]: (a[:,None]==c).argmax(axis=0)
Out[37]: array([0, 0, 0, 3, 3, 3, 7, 7])
In [38]: b[_]
Out[38]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')

Expand each "True" value in pandas DataFrame of bools to a "True-Block" of a fixed length

I have a pandas Dataframe of bool values like this:
df = pd.DataFrame(
index=range(10),
data={
'A': [False, False, True, False, False, False, False, False, True, False],
'B': [True, False, True, True, True, False, False, False, False, False]
}
)
I want to expand each True-value to a "True-Block" of at least length n=3, expanding it forward starting at the original True-value (or less than n if we are at the end of the DataFrame, see example A below). The desired result is in principle computed as this per column: For each True make sure that the next n-1 values are also True. So the desired output would be
desired = pd.DataFrame(
index=range(10),
data={
'A': [False, False, True, True, True, False, False, False, True, True],
'B': [True, True, True, True, True, True, True, False, False, False
}
)
It seems to be a simple problem asking for a one-liner but I cannot get a pandas-like and efficient solution.
If found this related question, but as I am not bound by date-intervals, it does not exactly apply here.
UPDATE:
In [97]: df.replace(False, np.nan).ffill(limit=2).fillna(False).astype(bool)
Out[97]:
A B
0 False True
1 False True
2 True True
3 True True
4 True True
5 False True
6 False True
7 False False
8 True False
9 True False
Old answer:
In [55]: idx = df.loc[df.B].index
In [57]: df.loc[idx.union(idx+1).union(idx+2), 'B'] = True
In [58]: df
Out[58]:
A B
0 False True
1 False True
2 True True
3 False True
4 False True
5 False True
6 False True
7 False False
8 True False
9 False False

Puzzled by TensorFlow nn.in_top_k output

I modified tensorflow convnet tutorial
to train just two classes.
Then I evaluated the model using cifar10_eval.py
I tried to understand the output of tf.nn.in_top_k
L128 top_k_op = tf.nn.in_top_k(logits, labels, 1)
which is printed out as:
in_top_k output:::
[array([ True, False, True, False, True, True, True, True, True, True], dtype=bool)]
while the true labels(two classes, 10 images) are:::
[0 1 1 1 1 1 1 1 1 0]
and the logits are:::
[[ 1.45472026 -1.46666598]
[-1.0181191 1.03441548]
[-1.02658665 1.04306769]
[-1.19205511 1.21065331]
[-1.22167087 1.24064851]
[-0.89583808 0.91119087]
[-0.17517655 0.18206072]
[-0.09379113 0.09957675]
[-1.05578279 1.07254183]
[ 0.73048806 -0.73411369] ]
Question: Why the second and fourth nn.in_top_k() output are False instead of True?
It shouldn't happen.
I evaluated the example you gave and got:
In [6]: top_k_op = tf.nn.in_top_k(logits, labels, 1)
In [7]: top_k_op.eval()
Out[7]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
By the way, you can substitute in_top_k(A, B, 1) by a simple argmax:
In [14]: tf.equal(tf.argmax(logits, 1), labels, tf.int64).eval()
Out[14]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)

numpy.genfromtxt cannot read boolean data correctly

No matter what the input value is, the np.genfromtxt will always return False.
Using dtype='u1' I get '1' as expected. But with dtype='b1' (Numpy's bool) I get 'False'.
I don't know if this is a bug or not, but so far, I've been able to get dtype=bool to work (without an explicit converter) only if the file contains the literal strings 'False' and 'True':
In [21]: bool_lines = ['False,False', 'False,True', 'True,False', 'True,True']
In [22]: genfromtxt(bool_lines, delimiter=',', dtype=bool)
Out[22]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
If your data is 0s and 1s, you can read it as integers and then convert to bool:
In [26]: bits = ['0,0', '0,1', '1,0', '1,1']
In [27]: genfromtxt(bits, delimiter=',', dtype=np.uint8).astype(bool)
Out[27]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
Or you can use a converter for each column
In [28]: cnv = lambda s: bool(int(s))
In [29]: converters = {0: cnv, 1: cnv}
In [30]: genfromtxt(bits, delimiter=',', dtype=bool, converters=converters)
Out[30]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)

Update numpy array row by condition

I need help) I have NumPy array:
False False False False
False True True False
True True True True
False True True False
False False False False
How can I get this (take first and last rows, that contains True, and set all of them elements to False)?
False False False False
False False False False
True True True True
False False False False
False False False False
arr[arr.any(axis=1).nonzero()[0][[0,-1]]] = False
How it works:
In [19]: arr
Out[19]:
array([[False, False, False, False],
[False, True, True, False],
[ True, True, True, True],
[False, True, True, False],
[False, False, False, False]], dtype=bool)
arr.any(axis=1) finds which rows contains a True value:
In [20]: arr.any(axis=1)
Out[20]: array([False, True, True, True, False], dtype=bool)
nonzero returns a tuple (one item for each axis) of indices of the True rows:
In [21]: arr.any(axis=1).nonzero()
Out[21]: (array([1, 2, 3]),)
We can use indexing to find the index of the first and last row containing a True value:
In [22]: arr.any(axis=1).nonzero()[0][[0,-1]]
Out[22]: array([1, 3])
And finally, we can set those rows to False with
In [23]: arr[arr.any(axis=1).nonzero()[0][[0,-1]]] = False
In [24]: arr
Out[24]:
array([[False, False, False, False],
[False, False, False, False],
[ True, True, True, True],
[False, False, False, False],
[False, False, False, False]], dtype=bool)
In case you meant "first and last" only in reference to the particular example ...
If every row that contains both True and False values should be set to False, then you shouldn't restrict to the "first and last" of these rows, and the solution is much easier. Using the fact that ~a.all(1) will tell you which rows are not all True, you can set those rows to False with:
arr[~arr.all(1)] = False
or, to avoid redundantly setting rows of entirely False to False, use exclusive or, ^:
arr[arr.any(1) ^ arr.all(1)] = False
which will be faster in some circumstances.