I have this dataframe:
name,A,B,C,D,E,F
x,1,2,3,0,5,6
y,5,5,6,0,4,2
z,2,3,3,0,1,1
2012-01-01,106.20,48.80,41.60,1015.04,211.13,643.55
2012-02-01,8.40,-9999.,4.80,15.36,0.37,0.02
2012-03-01,5.20,7.00,12.20,42.70,2.60,0.33
2012-04-01,45.60,29.80,48.20,718.18,-9999.,373.28
2012-05-01,-9999.,21.20,18.30,193.98,17.75,10.34
2012-06-01,122.40,95.30,103.00,4907.95,2527.59,37253.17
2012-07-01,-9999.,98.50,83.70,4122.23,1725.15,21355.74
2012-08-01,-9999.,113.00,94.80,5356.20,2538.84,40836.42
2012-09-01,-9999.,97.80,96.90,4738.41,2295.76,32667.42
2012-10-01,50.20,52.60,47.90,1259.77,301.71,1141.42
2012-11-01,76.40,-9999.,118.00,5858.70,3456.63,60814.94
2012-12-01,73.80,41.90,31.10,651.55,101.32,198.23
As you could notice, its represents the record on some data for the stations named [A,B,C,D,E,F] at different times. Each station has a position in the space with coordinates (x,y,z)
I read it as:
dfrGEO = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
nrows = 3,
infer_datetime_format = True,
cache_dates=True).replace(-9999.0, np.nan)
dfrDATA = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
header = 0,
skiprows = range(1,4),
infer_datetime_format = True,
cache_dates=True).replace(-9999.0, np.nan)
Let's say that I want to apply a function to all the element of the dataframe dfrDATA.
The first idea could be to set-up a double cycle with iloc but this will kill pandas advantages and i suppose the code performances.
Therefore, I come up with this:
def func_each_column(x,dfr):
"""
here apply again for each row
"""
res = 1
return res
res = dfrDATA.apply(func_each_column,args=(dfrDATA))
However, I have this error:
The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In addition, I would like to know if there is a better way to do what I want.
Thanks
import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
def get_C1(row):
return row.A + row.B
def get_C2(row):
return 'X' if row.A + row.B == 0 else 'Y'
def get_C3(row):
is_zero = row.A + row.B
return "X" if is_zero else "Y"
df = df.assign(C = lambda row: get_C3(row))
Why the get_C2 and get_C3 functions return an error?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You're thinking that df.assign, when passed a function, behaves like df.apply with axis=1, which calls the function for each row.
That's incorrect.
Per the docs for df.assign
Where the value is a callable, evaluated on df
That means that the function you pass to assign is called on the whole dataframe instead of each individual row.
So, in your function get_C3, the row parameter is not a row at all. It's a whole dataframe (and should be renamed to df or something else) and so row.A and row.B are two whole columns, rather than single cell values.
Thus, is_zero is a whole column as well, and ... if is_zero ... will not work.
I wanted to iterate through the pandas data frame but for some reason it does not work with .apply() method.
train = pd.read_csv('../kaggletrain')
pclass = train['Pclass']
# pclass has list of data with either 1, 2 or 3..
# so wanted to return if the cell is 1 then return True or everything False
def abc(pclass):
if pclass == 1:
return True
else:
return False
ABCDEFG = train.apply(abc, axis=1)
This gives valueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thank you for your help
ABCDEFG = train[train['pclass']==1]
I want to compare the home_score and away_score column values and if homescore<awayscore assigning homeloss , if homescore>awayscore assigning homewin and if homescore = awayscore assingning draw in new columns
era1800_1900 = era(eras,1800,1900)
era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]
I expect another column result in my data frame with values homeloss, homewin and draw based on the condition scores but i get this error when i used the following code
--era1800_1900 = era(eras,1800,1900)
era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-58ef8c4a0715> in <module>
1 era1800_1900 = era(eras,1800,1900)
----> 2 era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]
~\Anaconda3 new\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Try the following approach:
era['result'] = None
era.loc[era[era['A'] < era['B']].index.values,'result'] = 'homelose'
era.loc[era[era['A'] > era['B']].index.values,'result'] = 'homewin'
era.loc[era[era['A'] < era['B']].index.values,'result'] = 'homedraw'
If you are comfortable with functions, look at this example
I have 2 functions like this one:
def wind_index(result):
if result > 10:
return 1
elif (result > 0) & (result <= 5):
return 1.5
elif (result > 5) & (result <= 10):
return 2
def get_thermal_index(temp, hum):
return wind_index(temp - 0.4*(temp-10)*((1-hum)/100))
When I'm trying to apply this function like this:
df['tci'] = get_thermal_index(df['tempC'], df['humidity'])
I got this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What else can I do to get a new column for my DataFrame using those functions??
You can use Series.apply:
def get_thermal_index(temp, hum):
return (temp - 0.4*(temp-10)*((1-hum)/100)).apply(wind_index)