how to fix the calculation error which says 'DataFrame' object is not callable - pandas

im working on football data set and this is following error im getting. please help,
#what is the win rate of HomeTeam?
n_matches = df.shape[0]
n_features = df.shape[1] -1
n_homewin = len(df(df.FTR == 'H'))
win_rate = (float(n_homewin) / (n_matches)) * 100
print ("Total number of matches,{}".format(n_matches))
print ("Number of features,{}".format(n_features))
print ("Number of maches won by hom team,{}".format (n_homewin))
print ("win rate of home team,{:.2f}%" .format(win_rate))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-122-7e4d81fc684e> in <module>
5 n_features = df.shape[1] -1
6
----> 7 n_homewin = len(df(df.FTR == 'H'))
8
9 win_rate = (float(n_homewin) / (n_matches)) * 100
TypeError: 'DataFrame' object is not
expected result should print the team winning ratio

I think problem is with (), need [] for filter by boolean indexing:
n_homewin = len(df[df.FTR == 'H'])
Or simplier count Trues values by sum:
n_homewin = (df.FTR == 'H').sum()

you should modify it to df[df.FTR == 'H']. The parentheses imply a function call

Related

TypeError: descriptor 'lower' for 'str' objects doesn't apply to a 'list' object

I wanna stemming my dataset. Before stemming, I did tokenize use nltk tokenize
You can see the output on the pic
Dataset
Col Values
But when i do stemming, it return error :
[Error][3]
TypeError Traceback (most recent call
last)
<ipython-input-102-7700a8e3235b> in <module>()
----> 1 df['Message'] = df['Message'].apply(stemmer.stem)
2 df = df[['Message', 'Category']]
3 df.head()
5 frames
/usr/local/lib/python3.7/dist-
packages/Sastrawi/Stemmer/Filter/TextNormalizer.py in
normalize_text(text)
2
3 def normalize_text(text):
----> 4 result = str.lower(text)
5 result = re.sub(r'[^a-z0-9 -]', ' ', result, flags =
re.IGNORECASE|re.MULTILINE)
6 result = re.sub(r'( +)', ' ', result, flags =
re.IGNORECASE|re.MULTILINE)
TypeError: descriptor 'lower' requires a 'str' object but received a
'list'
Hope all you guys can help me

ValueError: could not convert string to float: 'n/a'

My Error is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-b8c70965fe8e> in <module>
10 new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
11 print("floated x:", new_df.CustomerLocation)
---> 12 new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
13 print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
14 else:
ValueError: could not convert string to float: 'n/a'
My code is:
new_df=pd.DataFrame({'MerchantLocation':(tuple(num) for num in data.merchant_long_lat), 'CustomerLocation': (tuple(num) for num in data.long_lat)})
for index, row in new_df.iterrows():
x=row.CustomerLocation
y=row.MerchantLocation
#new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (y[0].replace(r'^\s*$', np.NaN, regex=True), y[1].replace(r'^\s*$', np.NaN, regex=True)))
if x[0]!="" or x[1]!="" or y[0]!="" or y[1]!="":
print("x:",x)
new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
else:
print("There is an empty string")
There is an empty cell in raw excel data as I checked. This is from ANZ's Virtual Internship. I am unable to catch the empty string. Please help!
Try surrounding your problematic code region with a try-catch statement.
Something like:
if x[0]!="" or x[1]!="" or y[0]!="" or y[1]!="":
try:
print("x:",x)
new_df["CustomerLocation"] = new_df.CustomerLocation.replace(x, (float(x[0]), float(x[1])))
new_df["MerchantLocation"] = new_df.MerchantLocation.replace(y, (float(y[0]), float(y[1])))
print(haversine((float(x[0]), float(x[1])), (float(y[0]), float(y[1])), unit='mi'), "miles")
except:
print("one of these should be N/A")
print(x[0], x[1], y[0], y[1])
else:
print("There is an empty string")

Pandas function with 2 arguments to find threshold

I need to find people with greater or equal to threshold gain =>
dataframe contains column 'capitalGain' with different values 10,20,30,50,1000,5000,10000 ...etc
I try :
Function:
def get_num_people_with_higher_gain(dataframe, threshold_gain)
threshold_gain = dataframe["capitalGain"][dataframe["capitalGain"] >= threshold_gain].count()
return threshold_gain
Call function
df = get_num_people_with_higher_gain(dataframe, threshold_gain)
But I get the following error message:
NameError Traceback (most recent call last)
<ipython-input-50-5485c90412c8> in <module>
----> 1 df = get_num_people_with_higher_gain(dataframe, threshold_gain)
2 threshold = get_num_people_with_higher_gain(dataframe, threshold_gain)
NameError: name 'dataframe' is not defined
Since there are 2 arguments in the function (dataframe, threshold_gain), does it mean that both should be somehow defined within the function ?
Thanks
Finally,
Here is the solution
def get_num_people_with_higher_gain(dataframe, threshold_gain):
result = len(dataframe[dataframe["capitalGain"] >= threshold_gain])
return result
result = get_num_people_with_higher_gain(dataframe,60000)
result

Error : len() of unsized object - Wilconox signed-rank test

I am running Wilconox signed-rank test on the dataset which looks like :
df = {'Year': ['2019','2018','2017', ....], 'Name':{jon, tim, luca,...}, 'SelfPromotion': [1,0,1,...]}
the script is as follows:
import pandas
from scipy.stats import mannwhitneyu
data1 = df['SelfPromotion']=1
data2 = df['SelfPromotion']=0
print(mannwhitneyu(data1, data2))
this gives me the following error:
TypeError: len() of unsized object
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-e49d9838e5ac> in <module>
3 data1 = data['SelfPromotion']=1
4 data2 = data['SelfPromotion']=0
----> 5 print(mannwhitneyu(data11, data22))
~/opt/anaconda3/envs/shityaar/lib/python3.7/site-packages/scipy/stats/stats.py in mannwhitneyu(x, y, use_continuity, alternative)
6391 x = np.asarray(x)
6392 y = np.asarray(y)
-> 6393 n1 = len(x)
6394 n2 = len(y)
6395 ranked = rankdata(np.concatenate((x, y)))
TypeError: len() of unsized object
I have tried every possible solution for this error by looking at similar questions but unfortunately, no solution could get it to work. I would appreciate some help.
mannwhitneyu expects array like parameters and you are passing integers as args, hence the failure.
Do something like this:
In [26]: data1 = df['SelfPromotion'] == 1
In [28]: data2 = df['SelfPromotion'] == 0
In [31]: mannwhitneyu(data1, data2)
Out[31]: MannwhitneyuResult(statistic=3.0, pvalue=0.30962837708843105)

Convert labels(int) into one-hot vectors for tensorflow

Kindly help me to resolve my error.Thanks
This is my python code:
shape of Y (199584, 1) and data type is int
num_labels = len(np.unique(Y))
simulated_labels = np.eye(num_labels)[Y] # One liner trick!
print simulated_labels
Error:
IndexError Traceback (most recent call last)
in ()
1 num_labels = len(np.unique(Y)) # unique labels 681
2 print num_labels
----> 3 simulated_labels = np.eye(num_labels)[Y] # One liner trick!
4 print simulated_labels
5
IndexError: index 1001 is out of bounds for axis 0 with size 681
You can use tf.one_hot (There are examples in the doc string)