"TypeError: bad operand type for unary ~: 'float'" not down to NA (not available)? - pandas

I'm trying to filter a pandas data frame. Following #jezrael's answer here I can use the following to count up the rows I will be removing:
mask= ((analytic_events['section']==2) &
~(analytic_events['identifier'].str[0].str.isdigit()))
print (mask.sum())
However when I run this on my data I get the following error:
TypeError Traceback (most recent call last)
in
1 mask= ((analytic_events['section']==2) &
----> 2 ~(analytic_events['identifier'].str[0].str.isdigit()))
3
4 print (mask.sum())
c:\program files\python37\lib\site-packages\pandas\core\generic.py in invert(self)
1454 def invert(self): 1455 try:
-> 1456 arr = operator.inv(com.values_from_object(self))
1457 return self.array_wrap(arr)
1458 except Exception:
TypeError: bad operand type for unary ~: 'float'
The accepted wisdom for that error, bad operand type for unary ~: 'float', is that the unary operator encountered a NA value (for example, see this answer)
The problem is that I do not have any such missing data. Here's my analysis. Running
analytic_events[analytic_events['section']==2]['identifier'].str[0].value_counts(dropna=False)
gives the results:
2 1207791
3 39289
1 533
. 56
Or running
analytic_events[analytic_events['section']==2]['identifier'].str[0].str.isdigit().value_counts(dropna=False)
gives the results
True 1247613
False 56
(Note that the amounts above sum to the total number of rows, i.e. there are none missing.)
Using the more direct method suggested in #jezrael's answer below
analytic_events[analytic_events['section']==2]['identifier'].isnull().sum()
analytic_events[analytic_events['section']==2]['identifier'].str[0].isnull().sum()
both produce the output zero. So there are no NA (not available) values.
Why am I getting the error
TypeError: bad operand type for unary ~: 'float'
from the code at the start of this post?

I believe you need filter by first condition and then again in filtered values:
m1 = analytic_events['section']==2
mask = ~analytic_events.loc[m1, 'identifier'].str[0].str.isdigit()
print (mask.sum())

Related

pandas.Series.str.replace returns nan

I have some text stored in pandas.Series. for example:
df.loc[496]
'therapist and friend died in ~2006 Parental/Caregiver obligations:\n'
I need to replace the number in the text with full date, so I wrote
df.str.replace(
pat=r'(?:[^/])(\d{4}\b)',
repl= lambda m: ''.join('Jan/1/', m.groups()[0]),
regex=True
)
but the output is nan; though I tried to test the regular expression using findall, and there is no issue:
df.str.findall(r'(?:[^/])(\d{4}\b)')
496 [2006]
I don't understand what the issue is. most of the problems raised are about cases where Series type is number and not str; but my case is different the data type obviously is str. Nonetheless, I tried .astype(str) and still have the same result nan.
A possible solution:
df = pd.Series({496: 'therapist and friend died in ~2006 Parental/Caregiver obligations:\n'})
df.replace(r'~?(\d{4})\b', r'Jan 1, \1', regex=True)
Output:
496 therapist and friend died in Jan 1, 2006 Paren...
dtype: object

TypeError: '<' not supported between instances of 'int' and 'Timestamp'

I am trying to change the product name when the period between the expiry date and today is less than 6 months. When I try to add the color, the following error appears:
TypeError: '<' not supported between instances of 'int' and 'Timestamp'.
Validade is the column where the products expiry dates are in. How do I solve it?
epi1 = pd.read_excel('/content/timadatepandasepi.xlsx')
epi2 = epi1.dropna(subset=['Validade'])`
pd.DatetimeIndex(epi2['Validade'])
today = pd.to_datetime('today').normalize()
epi2['ate_vencer'] = (epi2['Validade'] - today) /np.timedelta64(1, 'M')
def add_color(x):
if 0 <x< epi2['ate_vencer']:
color='red'
return f'background = {color}'
epi2.style.applymap(add_color, subset=['Validade'])
Looking at your data, it seems that you're subtracting two dates and using this result inside your comparison. The problem is likely occurring because df['date1'] - today returns a pandas.Series with values of type pandas._libs.tslibs.timedeltas.Timedelta, and this type of object does not allow you to make comparisons with integers. Here's a possible solution:
epi2['ate_vencer'] = (epi2['Validade'] - today).dt.days
# Now you can compare values from `"ate_vencer"` with integers. For example:
def f(x): # Dummy function for demonstration purposes
return 0 < x < 10
epi2['ate_vencer'].apply(f) # This works
Example 1
Here's a similar error to yours, when subtracting dates and calling function f without .dt.days:
Example 2
Here's the same code but instead using .dt.days:

From one line iteration to loop to include exception management

I have two columns and I need to check whether the value in one column, all_news['Query], is in another column, all['description'] column.
I found the following solution:
all_news['C'] = on.apply(lambda x: x.Query in x.description, axis=1)
but I get the following error:
TypeError: ("argument of type 'float' is not iterable", 'occurred at
index 737')
Likely because there are some weird characters the iteration cannot decipher and it seems I cannot run any exception management in a one-line iteration.
How can I unfold this one line iteration into a for loop?
Result for index 737:
Query = 'medike international'
description = 'po ketvirtadienį praūžusios liūties nukentėjo ne tik kauno miestas, bet ir rajonas. pliaupiant lietui prie vilkijos vydūno alėjos esančioje apžvalgos aikštelėje ...'

How to use PLY module to realize two-line syntax analysis of "1+1 \n 2+2", output 2 and 4 respectively

Through PLY to achieve the "1+1 \n 2+2" result analysis, I think it is two irrelevant statements, but PLY has reduced them, how to make them irrelevant
def p_statement_expr(p):
'''statement : expression
print p[1]
def p_expr_num(p):
'''expression : NUMBER'''
p[0] = p[1]
if "__main__" == __name__:
parser = yacc.yacc(tabmodule="parser_main")
import time
t = time.time()
for i in range(1):
result = parser.parse("1+1 \n 2+2", debug=debug)
# print time.time() - t
# print result
Through PLY to achieve the "1+1 \n 2+2" result analysis, I think it is two irrelevant statements, but PLY has reduced them, how to make them irrelevant
PLY: PARSE DEBUG START State : 0 Stack : . LexToken(NUMBER,1,1,0) Action : Shift and goto state 3 State : 3 Stack : NUMBER . LexToken(ADD,'+',1,1) Action : Reduce rule [expression -> NUMBER] with [1] and goto state 5 Result : (1) State : 5 Stack : expression . LexToken(ADD,'+',1,1) Action : Shift and goto state 9 State : 9 Stack : expression ADD . LexToken(NUMBER,1,1,2) Action : Shift and goto state 3 State : 3 Stack : expression ADD NUMBER . LexToken(NUMBER,2,2,6) ERROR: Error : expression ADD NUMBER . LexToken(NUMBER,2,2,6)
When 2+2 is reported, how can I implement multi-line statement execution and automatically execute the following code after execution?
Ply has not done anything with the second expression.
Your grammar matches exactly one statement, assuming you are showing it all. Ply expects the input to terminate at that point, but it doesn't so Ply complains about an unexpected number.

SAS IML constraining a called function

How do I properly constrain this minimizing function?
Mincvf(cvf1) should minimize cvf1 with respect to h and I want to set so that h>=0.4
proc iml;
EDIT kirjasto.basfraaka var "open";
read all var "open" into cp;
p=cp[1:150];
conh={0.4 . .,. . .,. . .};
m=nrow(p);
m2=38;
pi=constant("pi");
e=constant("e");
start Kmod(x,h,pi,e);
k=1/(h#(2#pi)##(1/2))#e##(-x##2/(2#h##2));
return (k);
finish;
start mhatx2 (m2,newp,h,pi,e);
t5=j(m2,1); /*mhatx omit x=t*/
do x=1 to m2;
i=T(1:m2);
temp1=x-i;
ue=Kmod(temp1,h,pi,e)#newp[i];
le=Kmod(temp1,h,pi,e);
t5[x]=(sum(ue)-ue[x])/(sum(le)-le[x]);
end;
return (t5);
finish;
Start CVF1(h) global (newp,pi,e,m2);
cv3=j(m2,1);
cv3=1/m2#sum((newp-mhatx2(m2,newp,h,pi,e))##2);
return(cv3);
finish;
start mincvf(CVF1);
optn={0,0};
init=1;
call nlpqn(rc, res,"CVF1",init) blc="conh";
return (res);
finish;
start outer(p,m) global(newp);
wl=38; /*window length*/
m1=m-wl; /*last window begins at m-wl*/
newp=j(wl,1);
hyi=j(m1,1);
do x=1 to m1;
we=x+wl-1; /*window end*/
w=T(x:we); /*window*/
newp=p[w];
hyi[x]=mincvf(CVF1);
end;
return (hyi);
finish;
wl=38; /*window length*/
m1=m-wl; /*last window begins at m-wl*/
time=T(1:m1);
ttt=j(m1,1);
ttt=outer(p,m);
print time ttt p;
However I get lots of:
WARNING: Division by zero, result set to missing value.
count : number of occurrences is 2
operation : / at line 1622 column 22
operands : _TEM1003, _TEM1006
_TEM1003 1 row 1 col (numeric)
.
_TEM1006 1 row 1 col (numeric)
0
statement : ASSIGN at line 1622 column 1
traceback : module MHATX2 at line 1622 column 1
module CVF1 at line 1629 column 1
module MINCVF at line 1634 column 1
module OUTER at line 1651 column 1
Which happens because losing of precision when h approaches 0 and "le" in "mhatx2" approaches 0. At value h=0.4, le is ~0.08 so I just artificially picked that as a lower bound which is still precise enough.
Also this output of "outer" subroutine, ttt which is vector of h fitted for the rolling windows, still provides values below the constraint 0.4. Why?
I have solved loss of precision issues previously by simply applying a multiplication transformation to the input... Multiply it by 10,000 or whatever is necessary, and then revert the transformation at the end.
Not sure if it will work in your situation, but it may be worth a shot.
This way it works, had to put that option and constrain vector both into the input parentheses:
Now I get no division by 0 warning. The previously miss-specified-due-loss-of-precision point's are now not specified at all and the value is substituted by 0.14 but the error isn't likely big.
start mincvf(CVF1);
con={0.14 . .,. . .,. . .};
optn={0,0};
init=1;
call nlpqn(rc, res,"CVF1",init,optn,con);
return (res);
finish;