No matter what the input value is, the np.genfromtxt will always return False.
Using dtype='u1' I get '1' as expected. But with dtype='b1' (Numpy's bool) I get 'False'.
I don't know if this is a bug or not, but so far, I've been able to get dtype=bool to work (without an explicit converter) only if the file contains the literal strings 'False' and 'True':
In [21]: bool_lines = ['False,False', 'False,True', 'True,False', 'True,True']
In [22]: genfromtxt(bool_lines, delimiter=',', dtype=bool)
Out[22]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
If your data is 0s and 1s, you can read it as integers and then convert to bool:
In [26]: bits = ['0,0', '0,1', '1,0', '1,1']
In [27]: genfromtxt(bits, delimiter=',', dtype=np.uint8).astype(bool)
Out[27]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
Or you can use a converter for each column
In [28]: cnv = lambda s: bool(int(s))
In [29]: converters = {0: cnv, 1: cnv}
In [30]: genfromtxt(bits, delimiter=',', dtype=bool, converters=converters)
Out[30]:
array([[False, False],
[False, True],
[ True, False],
[ True, True]], dtype=bool)
Related
I have the following matrix defined:
d = np.array(
[[False, False, False, False, False, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[False, True, True, True, True, True],
[False, False, True, True, True, True],
[False, False, False, True, True, True],
[False, False, False, False, True, True],
[False, False, False, False, False, True],
[False, False, False, False, True, True],
[False, False, False, True, True, True],
[False, False, True, True, True, True],
[False, True, True, True, True, True],
[ True, True, True, True, True, True]])
And I would like to get a vector of length 6 containing the index of the first True occurrence in each column.
So the expected output would be:
fo = np.array([10, 8, 6, 4, 2, 0])
If there would be no True values in a given column ideally it shall return NaN for that column.
I have tried:
np.sum(d, axis=0)
array([ 4, 8, 12, 16, 20, 23])
which together with the length of the columns would give the index, but that would work only if there would be only two continuous regions, one with False and another one with True.
You can do this using argmax which find the first true, and then find columns which all is False to cure the result as needed for columns contain only False. e.g. if the first column all is False:
# if first column be all False, so it show 0, too; which need additional work using mask
ini = np.argmax(d == 1, 0) # [0 8 6 4 2 0] # if we want to fill with nans so convert it to object using ".astype(object)"
sec = (d == 0).all(0) # find column with all False
ini[sec] = 1000
# [1000 8 6 4 2 0]
First, we can iterate through the Numpy array. Then, we can check if True is in the nested array we are looking at. If so, we use .index() to find what the index is.
index_list = []
for nested_list in d:
if True in nested_list:
index_list.append(nested_list.index(True))
Given
a = np.array([1,2,3,4,5,6,7,8])
b = np.array(['a','b','c','d','e','f','g','h'])
c = np.array([1,1,1,4,4,4,8,8])
where a & b 'correspond' to each other, how can I use c to slice b to get d which 'corresponds' to c:
d = np.array(['a','a','a','d','d','d','h','h')]
I know how to do this by looping
for n in range(a.shape[0]):
d[n] = b[np.argmax(a==c[n])]
but want to know if I can do this without loops.
Thanks in advance!
With the a that is just position+1 you can simply use
In [33]: b[c - 1]
Out[33]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')
I'm tempted to leave it at that, since the a example isn't enough to distinguish it from the argmax approach.
But we can test all a against all c with:
In [36]: a[:,None]==c
Out[36]:
array([[ True, True, True, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, True, True, True, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, True, True]])
In [37]: (a[:,None]==c).argmax(axis=0)
Out[37]: array([0, 0, 0, 3, 3, 3, 7, 7])
In [38]: b[_]
Out[38]: array(['a', 'a', 'a', 'd', 'd', 'd', 'h', 'h'], dtype='<U1')
I was reading a book on Data Analysis with Python where there's a topic on Boolean Indexing.
This is the Code given in the Book:
>>> import numpy as np
>>> names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
>>> data = np.random.randn(7,4)
>>> names
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')
>>> data
array([[ 0.35214065, -0.6258314 , -1.18156785, -0.75981437],
[-0.54500574, -0.21700484, 0.34375588, -0.99216205],
[ 0.29883509, -3.08641931, 0.61289669, 0.58233649],
[ 0.32047465, 0.05380018, -2.29797299, 0.04553794],
[ 0.35764077, -0.51405297, -0.21406197, -0.88982479],
[-0.59219242, -1.87402141, -2.66339726, 1.30208623],
[ 0.32612407, 0.19612659, -0.63334406, 1.0275622 ]])
>>> names == 'Bob'
array([ True, False, False, True, False, False, False])
Until this it's perfectly clear. But I'm unable to understand when they do data[names == 'Bob']
>>> data[names == 'Bob']
array([[ 0.35214065, -0.6258314 , -1.18156785, -0.75981437],
[ 0.32047465, 0.05380018, -2.29797299, 0.04553794]])
>>> data[names == 'Bob', 2:]
array([[-1.18156785, -0.75981437],
[-2.29797299, 0.04553794]])
How is this happening?
data[names == 'Bob']
is the same as:
data[[True, False, False, True, False, False, False]]
And this just means to get row 0 and row 4 from data.
data[names == 'Bob',2:]
gives the same rows, but now restricts the columns to start with column 2. Before the comma concerns the rows, after the comma concerns the columns.
I modified tensorflow convnet tutorial
to train just two classes.
Then I evaluated the model using cifar10_eval.py
I tried to understand the output of tf.nn.in_top_k
L128 top_k_op = tf.nn.in_top_k(logits, labels, 1)
which is printed out as:
in_top_k output:::
[array([ True, False, True, False, True, True, True, True, True, True], dtype=bool)]
while the true labels(two classes, 10 images) are:::
[0 1 1 1 1 1 1 1 1 0]
and the logits are:::
[[ 1.45472026 -1.46666598]
[-1.0181191 1.03441548]
[-1.02658665 1.04306769]
[-1.19205511 1.21065331]
[-1.22167087 1.24064851]
[-0.89583808 0.91119087]
[-0.17517655 0.18206072]
[-0.09379113 0.09957675]
[-1.05578279 1.07254183]
[ 0.73048806 -0.73411369] ]
Question: Why the second and fourth nn.in_top_k() output are False instead of True?
It shouldn't happen.
I evaluated the example you gave and got:
In [6]: top_k_op = tf.nn.in_top_k(logits, labels, 1)
In [7]: top_k_op.eval()
Out[7]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
By the way, you can substitute in_top_k(A, B, 1) by a simple argmax:
In [14]: tf.equal(tf.argmax(logits, 1), labels, tf.int64).eval()
Out[14]: array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
I need help) I have NumPy array:
False False False False
False True True False
True True True True
False True True False
False False False False
How can I get this (take first and last rows, that contains True, and set all of them elements to False)?
False False False False
False False False False
True True True True
False False False False
False False False False
arr[arr.any(axis=1).nonzero()[0][[0,-1]]] = False
How it works:
In [19]: arr
Out[19]:
array([[False, False, False, False],
[False, True, True, False],
[ True, True, True, True],
[False, True, True, False],
[False, False, False, False]], dtype=bool)
arr.any(axis=1) finds which rows contains a True value:
In [20]: arr.any(axis=1)
Out[20]: array([False, True, True, True, False], dtype=bool)
nonzero returns a tuple (one item for each axis) of indices of the True rows:
In [21]: arr.any(axis=1).nonzero()
Out[21]: (array([1, 2, 3]),)
We can use indexing to find the index of the first and last row containing a True value:
In [22]: arr.any(axis=1).nonzero()[0][[0,-1]]
Out[22]: array([1, 3])
And finally, we can set those rows to False with
In [23]: arr[arr.any(axis=1).nonzero()[0][[0,-1]]] = False
In [24]: arr
Out[24]:
array([[False, False, False, False],
[False, False, False, False],
[ True, True, True, True],
[False, False, False, False],
[False, False, False, False]], dtype=bool)
In case you meant "first and last" only in reference to the particular example ...
If every row that contains both True and False values should be set to False, then you shouldn't restrict to the "first and last" of these rows, and the solution is much easier. Using the fact that ~a.all(1) will tell you which rows are not all True, you can set those rows to False with:
arr[~arr.all(1)] = False
or, to avoid redundantly setting rows of entirely False to False, use exclusive or, ^:
arr[arr.any(1) ^ arr.all(1)] = False
which will be faster in some circumstances.