Numpy bug using .any()? - numpy

I'm having the following error using NumPy:
>>> distance = 0.9014179933248182
>>> min_distance = np.array([0.71341723, 0.07322284])
>>> distance < min_distance
array([False, False])
which is right, but when I try:
>>> distance < min_distance.any()
True
which is obviously wrong, since there is no number in 'min_distance' smaller than 'distance'
What is going on here? I'm using NumPy on Google Colab, on version '1.17.3'.

Whilst numpy bugs are common, this is not one. Note that min_distance.any() returns a boolean result. So in this expression:
distance < min_distance.any()
you are comparing a float with a boolean, which unfortunately works, because of a comedy of errors:
bool is a subclass of int
True is equal to 1
floats are comparable with integers.
E.g.
>>> 0.9 < True
True
>>> 1.1 < True
False
What you wanted instead:
>>> (distance < min_distance).any()
False

try (distance < min_distance).any()

Related

making numpy binary file data to two decimal [duplicate]

I have a numpy array, something like below:
data = np.array([ 1.60130719e-01, 9.93827160e-01, 3.63108206e-04])
and I want to round each element to two decimal places.
How can I do so?
Numpy provides two identical methods to do this. Either use
np.round(data, 2)
or
np.around(data, 2)
as they are equivalent.
See the documentation for more information.
Examples:
>>> import numpy as np
>>> a = np.array([0.015, 0.235, 0.112])
>>> np.round(a, 2)
array([0.02, 0.24, 0.11])
>>> np.around(a, 2)
array([0.02, 0.24, 0.11])
>>> np.round(a, 1)
array([0. , 0.2, 0.1])
If you want the output to be
array([1.6e-01, 9.9e-01, 3.6e-04])
the problem is not really a missing feature of NumPy, but rather that this sort of rounding is not a standard thing to do. You can make your own rounding function which achieves this like so:
def my_round(value, N):
exponent = np.ceil(np.log10(value))
return 10**exponent*np.round(value*10**(-exponent), N)
For a general solution handling 0 and negative values as well, you can do something like this:
def my_round(value, N):
value = np.asarray(value).copy()
zero_mask = (value == 0)
value[zero_mask] = 1.0
sign_mask = (value < 0)
value[sign_mask] *= -1
exponent = np.ceil(np.log10(value))
result = 10**exponent*np.round(value*10**(-exponent), N)
result[sign_mask] *= -1
result[zero_mask] = 0.0
return result
It is worth noting that the accepted answer will round small floats down to zero as demonstrated below:
>>> import numpy as np
>>> arr = np.asarray([2.92290007e+00, -1.57376965e-03, 4.82011728e-08, 1.92896977e-12])
>>> print(arr)
[ 2.92290007e+00 -1.57376965e-03 4.82011728e-08 1.92896977e-12]
>>> np.round(arr, 2)
array([ 2.92, -0. , 0. , 0. ])
You can use set_printoptions and a custom formatter to fix this and get a more numpy-esque printout with fewer decimal places:
>>> np.set_printoptions(formatter={'float': "{0:0.2e}".format})
>>> print(arr)
[2.92e+00 -1.57e-03 4.82e-08 1.93e-12]
This way, you get the full versatility of format and maintain the precision of numpy's datatypes.
Also note that this only affects printing, not the actual precision of the stored values used for computation.

Pandas: Fast way to get cols/rows containing na

In Pandas we can drop cols/rows by .dropna(how = ..., axis = ...) but is there a way to get an array-like of True/False indicators for each col/row, which would indicate whether a col/row contains na according to how and axis arguments?
I.e. is there a way to convert .dropna(how = ..., axis = ...) to a method, which would instead of actual removal just tell us, which cols/rows would be removed if we called .dropna(...) with specific how and axis.
Thank you for your time!
You can use isna() to replicate the behaviour of dropna without actually removing data. To mimic the 'how' and 'axis' parameter, you can add any() or all() and set the axis accordingly.
Here is a simple example:
import pandas as pd
df = pd.DataFrame([[pd.NA, pd.NA, 1], [pd.NA, pd.NA, pd.NA]])
df.isna()
Output:
0 1 2
0 True True False
1 True True True
Eq. to dropna(how='any', axis=0)
df.isna().any(axis=0)
Output:
0 True
1 True
2 True
dtype: bool
Eq. to dropna(how='any', axis=1)
df.isna().any(axis=1)
Output:
0 True
1 True
dtype: bool

How to we replace log(0) with 0?

RuntimeWarning: invalid value encountered in multiply
I have a code:
a = Y_list * np.log(Y_list/E_Y)
print(a)
My Y_list contains 0 values, I'm wondering how to do when Y_list = 0 , np.log(0) = 0?
You can use np.where It lets you define a condition for true and false and assign different values.
np.where((Y_list/E_Y)!= 0, np.log(Y_list/E_Y),0)
Alternatively, we can run np.log with a where parameter:
import numpy as np
a = np.arange(0, 5000, 1000)
np.log(a, where=a != 0)
# array([0. , 6.90775528, 7.60090246, 8.00636757, 8.29404964])

Numpy: fuzzy 'greater_than' operator, working on list of values (requesting advices on existing code)

I have implemented a numpy function that:
takes as inputs:
a n (rows) x m (columns) array of floats.
a threshold (float)
for each row:
if the max value of the row is larger than or equal to threshold,
if this max value is not preceded in the same row by a min value lower than or equal to -threshold,
then this row is flagged True (larger than),
else this row is flagged False (not larger than)
returns then this n (rows) x 1 (column) array of booleans
What I have implemented works (at least on provided example), but I am far from being an expert in numpy, and I wonder if there is no more efficient way of handling this (possibly avoid the miscellaneous transpose & tile for instance?)
I would gladly accept any advice on how making this function more efficient and/or readable.
import numpy as np
import pandas as pd
# Test data
threshold=0.02 #2%
df = pd.DataFrame({'variation_1': [0.01, 0.02, 0.005, -0.02, -0.01, -0.01],
'variation_2': [-0.01, 0.08, 0.08, 0.01, -0.02, 0.01],
'variation_3': [0.005, -0.03, -0.03, 0.002, 0.025, -0.03],
})
data = df.values
Checking expected results:
In [75]: df
Out[75]:
variation_1 variation_2 variation_3 # Expecting
0 0.010 -0.01 0.005 # False (no value larger than threshold)
1 0.020 0.08 -0.030 # True (1st value equal to threshold)
2 0.005 0.08 -0.030 # True (2nd value larger than threshold)
3 -0.020 0.01 0.002 # False (no value larger than threshold)
4 -0.010 -0.02 0.025 # False (2nd value lower than -threshold)
5 -0.010 0.01 -0.030 # False (no value larger than threshold)
Current function.
def greater_than(data: np.ndarray, threshold: float) -> np.ndarray:
# Step 1.
# Filtering out from 'low_max' mask the rows which 'max' is not greater than or equal
# to 'threshold'. 'low_max' is reshaped like input array for use in next step.
data_max = np.amax(data, axis=1)
low_max = np.transpose([data_max >= threshold] * data.shape[1])
# Step 2.
# Filtering values preceding max of each row
max_idx = np.argmax(data, axis=1) # Get idx of max.
max_idx = np.transpose([max_idx] * data.shape[1]) # Reshape like input array.
# Create an array of index.
idx_array = np.tile(np.arange(data.shape[1]), (data.shape[0],1))
# Keep indices lower than index of max for each row, and filter out rows with
# a max too low vs 'threshold' (from step 1).
mask_max = (idx_array <= max_idx) & (low_max)
# Step 3.
# On a masked array re-using mask from step 2 to filter out unqualifying values,
# filter out rows with a 'min' preceding the 'max' and that are lower than or
# equal to '-threshold'.
data = np.ma.array(data, mask=~mask_max)
data_min = np.amin(data, axis=1)
mask_min = data_min > -threshold
# Return 'mask_min', filling masked values with 'False'.
return np.ma.filled(mask_min, False)
Results.
res = greater_than(data, threshold)
In [78]:res
Out[78]: array([False, True, True, False, False, False])
Thanks in advance for any advice!
lesser = data <= -threshold
greater = data >= threshold
idx_lesser = np.argmax(lesser, axis=1)
idx_greater = np.argmax(greater, axis=1)
has_lesser = np.any(lesser, axis=1)
has_greater = np.any(greater, axis=1)
outptut = has_greater * (has_lesser * (idx_lesser > idx_greater) + np.logical_not(has_lesser))
yields your expected output on your data and should be quite fast. Also, I'm not entirely sure I understand your explanation so if this doesn't work on your actual data let me know.

Why does this numpy array comparison fail?

I try to compare the results of some numpy.array calculations with expected results, and I constantly get false comparison, but the printed arrays look the same, e.g:
def test_gen_sine():
A, f, phi, fs, t = 1.0, 10.0, 1.0, 50.0, 0.1
expected = array([0.54030231, -0.63332387, -0.93171798, 0.05749049, 0.96724906])
result = gen_sine(A, f, phi, fs, t)
npt.assert_array_equal(expected, result)
prints back:
> raise AssertionError(msg)
E AssertionError:
E Arrays are not equal
E
E (mismatch 100.0%)
E x: array([ 0.540302, -0.633324, -0.931718, 0.05749 , 0.967249])
E y: array([ 0.540302, -0.633324, -0.931718, 0.05749 , 0.967249])
My gen_sine function is:
def gen_sine(A, f, phi, fs, t):
sampling_period = 1 / fs
num_samples = fs * t
samples_range = (np.arange(0, num_samples) * 2 * f * np.pi * sampling_period) + phi
return A * np.cos(samples_range)
Why is that? How should I compare the two arrays?
(I'm using numpy 1.9.3 and pytest 2.8.1)
The problem is that np.assert_array_equal returns None and does the assert statement internally. It is incorrect to preface it with a separate assert as you do:
assert np.assert_array_equal(x,y)
Instead in your test you would just do something like:
import numpy as np
from numpy.testing import assert_array_equal
def test_equal():
assert_array_equal(np.arange(0,3), np.array([0,1,2]) # No assertion raised
assert_array_equal(np.arange(0,3), np.array([2,0,1]) # Raises AssertionError
Update:
A few comments
Don't rewrite your entire original question, because then it was unclear what an answer was actually addressing.
As far as your updated question, the issue is that assert_array_equal is not appropriate for comparing floating point arrays as is explained in the documentation. Instead use assert_allclose and then set the desired relative and absolute tolerances.