Strange behavior of a hyperoperator - raku

I found out this strange behavior of a hyperoperator:
say 0 != 0; # False
my #a = 0, 0, 0;
say #a «==» #a; # [True True True]
say #a «!=» #a; # [True True True] <--- why?
say #a «!==» #a; # [False False False]
The != infix operator is defined here as equivalent to !== while apparently it's not, at least when used in a hyperoperator.
The problem seems to be related to the specific value (zero), since using a different value returns the expected result.
Besides, using 0 but True works fine:
#a = 0 but True, 0 but True, 0 but True;
say #a «!=» #a; # [False False False]
I'm using Rakudo 2022.03.
Is this a bug or something I cannot grasp?

It most definitely is a bug, as this:
my #a = 0, 0, 0;
say HYPER(&infix:<<!=>>, #a, #a); # [False False False]
gives the correct result. However, what gets passed to HYPER is not the &infix:<<!=>>, but a code block that apparently wraps it (incorrectly).
Investigating further, but please, yes, this is a bug and as such should be reported :-)
EDIT: https://github.com/rakudo/rakudo/issues/4838
EDIT: https://github.com/rakudo/rakudo/pull/4839 fixes it, but am unsure about the way it fixes it :-)
EDIT: Fix has been approved by jnthn, merged. Will be in 2022.04. Thanks for noting!

Related

difference array.any() vs. np.any(array())

Simple question, confusing output:
np.array([-1, 0, 1]).any() < 0
out: False (why??)
but:
np.any(np.array([-1, 0, 1]))
out: True (as expected)
Acoording to documentation both commands should be equivalent, but they aren't.
Numpy is used to provide a method and a function that do the exact same thing.
assert my_array.any() == numpy.any(my_array)
Here my_array.any() is the method and numpy.any(my_array) is the function.
It both return a Boolean. Here you ask why np.array([-1, 0, 1]).any() < 0 returns False because np.array([-1, 0, 1]).any() is True which is equal to the value 1 and you ask if it is < 0 which is False.
import numpy as np
my_array = np.array([-1, 0, 1])
assert my_array.any() == True
assert my_array.any() == 1
assert my_array.any() > 0
assert np.any(my_array) == True
assert my_array.any() == np.any(my_array)

Numpy bug using .any()?

I'm having the following error using NumPy:
>>> distance = 0.9014179933248182
>>> min_distance = np.array([0.71341723, 0.07322284])
>>> distance < min_distance
array([False, False])
which is right, but when I try:
>>> distance < min_distance.any()
True
which is obviously wrong, since there is no number in 'min_distance' smaller than 'distance'
What is going on here? I'm using NumPy on Google Colab, on version '1.17.3'.
Whilst numpy bugs are common, this is not one. Note that min_distance.any() returns a boolean result. So in this expression:
distance < min_distance.any()
you are comparing a float with a boolean, which unfortunately works, because of a comedy of errors:
bool is a subclass of int
True is equal to 1
floats are comparable with integers.
E.g.
>>> 0.9 < True
True
>>> 1.1 < True
False
What you wanted instead:
>>> (distance < min_distance).any()
False
try (distance < min_distance).any()

How to interpret the result of pandas dataframe boolean index

The above operation seems a little trivia, however, I am a little lost as to the output of the operation. Below is a piece of code to illustrate my point.
# sample data for understanding concept of boolean indexing:
d_f = pd.DataFrame({'x':[0,1,2,3,4,5,6,7,8,9], 'y':[10,12,13,14,15,16,1,2,3,5]})
# changing index of dataframe:
d_f = d_f.set_index([list('abcdefghig')])
# a list of data:
myL = np.array(range(1,11))
# loop through data to boolean slicing and indexing:
for r in myL:
DF2 = d_f['x'].values == r
The result of the above code is:
array([False,
False,
False,
False,
False,
False,
False,
False,
False,
False],
dtype=bool
But all the values in myL are in d_f['x'].values except 0. It, therefore, appears that the program was doing an 'index for index' matching of the elements in the myL and d_f['x'].values. Is this a typical behavior of pandas library? If so, can some please explain the rationale behind this for me. Thank you in advance.
As #coldspeed states, you are overwriting DF2 with d_f['x'] == 10 which is a boolean series of all False.
What I think you are trying to do is this instead:
d_f['x'].isin(myL)
Output:
a False
b True
c True
d True
e True
f True
g True
h True
i True
g True
Name: x, dtype: bool

pylint, pandas : Comparison to True should be just 'expr' or 'expr is True' (singleton-comparison)

did anyone solve this pylint issue when using pandas?
C:525,59: Comparison to True should be just 'expr' or 'expr is True' (singleton-comparison)
this happens in the line where i'm using:
df_current_dayparts_raw['is_standard'] == True
I tried these but didn't work:
df_current_dayparts_raw['is_standard'] is True
df_current_dayparts_raw['is_standard'].isin([True])
df_current_dayparts_raw['is_standard'].__eq__(True)
If you have instantiate a dataframe with the following code
test = pd.DataFrame({"bool": [True, False, True], "val":[1,2,3]})
>>> test
bool val
0 True 1
1 False 2
2 True 3
the following should only return the fields where "bool" is True
test[test['bool']]
bool val
0 True 1
2 True 3
You do not need to explicitly state that test['bool'] == True, test['bool'] should be enough. This should be pylint compliant and satisfy singleton-comparison.
A bit late, but maybe this will be useful for someone. This worked for me:
if df_current_dayparts_raw['is_standard']:
print("True")
Instead of <your_expr> == True, try numpy.equal(<your_expr>, True) . Concretely:
import numpy as np
np.equal(df_current_dayparts_raw['is_standard'], True)

When does advanced indexing on structured masked arrays *really* return a copy?

When I have a structured masked array with boolean indexing, under what conditions do I get a view and when do I get a copy? The documentation says that advanced indexing always returns a copy, but this is not true, since something like X[X>0]=42 is technically advanced indexing, but the assignment works. My situation is more complex:
I want to set the mask of a particular field based on a criterion from another field, so I need to get the field, apply the boolean indexing, and get the mask. There are 3! = 6 orders of doing so.
Preparation:
In [83]: M = ma.MaskedArray(random.random(400).view("f8,f8,f8,f8")).reshape(10, 10)
In [84]: crit = M[:, 4]["f2"] > 0.5
Field - index - mask (fails):
In [85]: M["f3"][crit, 3].mask = True
In [86]: print(M["f3"][crit, 3].mask)
[False False False False False]
Index - field - mask (fails):
In [87]: M[crit, 3]["f3"].mask = True
In [88]: print(M[crit, 3]["f3"].mask)
[False False False False False]
Index - mask - field (fails):
In [94]: M[crit, 3].mask["f3"] = True
In [95]: print(M[crit, 3].mask["f3"])
[False False False False False]
Mask - index - field (fails):
In [101]: M.mask[crit, 3]["f3"] = True
In [102]: print(M.mask[crit, 3]["f3"])
[False False False False False]
Field - mask - index (succeeds):
In [103]: M["f3"].mask[crit, 3] = True
In [104]: print(M["f3"].mask[crit, 3])
[ True True True True True]
# set back to False so I can try method #6
In [105]: M["f3"].mask[crit, 3] = False
In [106]: print(M["f3"].mask[crit, 3])
[False False False False False]
Mask - field - index (succeeds):
In [107]: M.mask["f3"][crit, 3] = True
In [108]: print(M.mask["f3"][crit, 3])
[ True True True True True]
So, it looks like indexing must come last.
The issue of __setitem__ v. __getitem__ is important, but with structured array and masking it's a little harder to sort out when a __getitem__ is first making a copy.
Regarding the structured arrays, it shouldn't matter whether the field index occurs first or the element. However some releases appear to have a bug in this regard. I'll try to find a recent SO question where this was a problem.
With a masked array, there's the question of how to correctly modify the mask. The .mask is a property that accesses the underlying ._mask array. But that is fetched with __getattr__. So the simple setitem v getitem distinction does not apply directly.
Lets skip the structured bit first
In [584]: M = np.ma.MaskedArray(np.arange(4))
In [585]: M
Out[585]:
masked_array(data = [0 1 2 3],
mask = False,
fill_value = 999999)
In [586]: M.mask
Out[586]: False
In [587]: M.mask[[1,2]]=True
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-587-9010ee8f165e> in <module>()
----> 1 M.mask[[1,2]]=True
TypeError: 'numpy.bool_' object does not support item assignment
Initially mask is a scalar boolean, not an array.
This works
In [588]: M.mask=np.zeros((4,),bool) # change mask to array
In [589]: M
Out[589]:
masked_array(data = [0 1 2 3],
mask = [False False False False],
fill_value = 999999)
In [590]: M.mask[[1,2]]=True
In [591]: M
Out[591]:
masked_array(data = [0 -- -- 3],
mask = [False True True False],
fill_value = 999999)
This does not
In [592]: M[[1,2]].mask=True
In [593]: M
Out[593]:
masked_array(data = [0 -- -- 3],
mask = [False True True False],
fill_value = 999999)
M[[1,2]] is evidently the copy, and the assignment is to its mask attribute, not M.mask.
....
A masked array has .__setmask__ method. You can study that in np.ma.core.py. And the mask property is defined with
mask = property(fget=_get_mask, fset=__setmask__, doc="Mask")
So M.mask=... does use this.
So it looks like the problem case is doing
M.__getitem__(index).__setmask__(values)
hence the copy. The M.mask[]=... is doing
M._mask.__setitem__(index, values)
since _getmask just does return self._mask.
M["f3"].mask[crit, 3] = True
works because M['f3'] is a view. (M[['f1','f3']] is ok for get, but doesn't work for setting).
M.mask["f3"] is also a view. I'm not entirely sure of the order the relevant get and sets. __setmask__ has code that deals specifically with compound dtype (structured).
=========================
Looking at a structured array, without the masking complication, the indexing order matters
In [607]: M1 = np.arange(16).view("i,i")
In [609]: M1[[3,4]]['f1']=[3,4] # no change
In [610]: M1[[3,4]]['f1']
Out[610]: array([7, 9], dtype=int32)
In [611]: M1['f1'][[3,4]]=[1,2] # change
In [612]: M1
Out[612]:
array([(0, 1), (2, 3), (4, 5), (6, 1), (8, 2), (10, 11), (12, 13), (14, 15)], dtype=[('f0', '<i4'), ('f1', '<i4')])
So we still have a __getitem__ followed by a __setitem__, and we have to pay attention as to whether the get returns a view or a copy.
This is because although advanced indexing returns a copy, assigning to advanced indexing still works. Only the method where advanced indexing is the last operation is assigning to advanced indexing (through __setitem__).