Calculate Average True Range directly with Dataframe

Calculate Average True Range directly with Dataframe - pandas

I wonder if there is a simple and direct way to calculate ATR from DataFrame object. I am stuck in the max() part. This is what I am trying to do:
df['atr']=max( (df['High']-df['Low']), (df['High']-df['Close'].shift()).abs(), (df['Low']-df['close'].shift()).abs() )
The above code gives this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I understand that to use max() in this context is not appropriate for the dataframe object. But if it works this would be rather elegant and simple. Just wonder if there are built in functions within dataframe object to achieve this.

Following your approach:
np.max( ((df['High']-df['Low']).values, np.abs(df['High']-df['Close'].shift()), np.abs(df['Low']-df['Close'].shift())) , axis=0)
A function can be this (no pandas copy warning):
def ATR(data: pd.DataFrame, window=14, use_nan=True) -> pd.Series:
df_ = data.copy(deep=True)
df_.loc[:, 'H_L'] = df_['High'] - df_['Low']
df_.loc[:, 'H_Cp'] = abs(df_['High'] - df_['Close'].shift(1))
df_.loc[:, 'L_Cp'] = abs(df_['Low'] - df_['Close'].shift(1))
df_.loc[:, 'TR'] = df_[["H_L", "H_Cp", "L_Cp"]].max(axis=1)
df_.loc[:, 'ATR'] = df_['TR'].rolling(window).mean()
for i in range(window, len(df_)):
df_.iloc[i, df_.columns.get_loc('ATR')] = (((df_.iloc[i - 1, df_.columns.get_loc('ATR')]) * (window - 1)) + df_.iloc[
i, df_.columns.get_loc('TR')]) / window
if use_nan:
df_.iloc[:window, df_.columns.get_loc('ATR')] = np.nan
return df_['ATR']

Related

apply a funtion to all element in a dataframe by considering all values of ement row

I have this dataframe:
name,A,B,C,D,E,F
x,1,2,3,0,5,6
y,5,5,6,0,4,2
z,2,3,3,0,1,1
2012-01-01,106.20,48.80,41.60,1015.04,211.13,643.55
2012-02-01,8.40,-9999.,4.80,15.36,0.37,0.02
2012-03-01,5.20,7.00,12.20,42.70,2.60,0.33
2012-04-01,45.60,29.80,48.20,718.18,-9999.,373.28
2012-05-01,-9999.,21.20,18.30,193.98,17.75,10.34
2012-06-01,122.40,95.30,103.00,4907.95,2527.59,37253.17
2012-07-01,-9999.,98.50,83.70,4122.23,1725.15,21355.74
2012-08-01,-9999.,113.00,94.80,5356.20,2538.84,40836.42
2012-09-01,-9999.,97.80,96.90,4738.41,2295.76,32667.42
2012-10-01,50.20,52.60,47.90,1259.77,301.71,1141.42
2012-11-01,76.40,-9999.,118.00,5858.70,3456.63,60814.94
2012-12-01,73.80,41.90,31.10,651.55,101.32,198.23
As you could notice, its represents the record on some data for the stations named [A,B,C,D,E,F] at different times. Each station has a position in the space with coordinates (x,y,z)
I read it as:
dfrGEO = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
nrows = 3,
infer_datetime_format = True,
cache_dates=True).replace(-9999.0, np.nan)
dfrDATA = pd.read_csv(f_name,
parse_dates = True,
index_col = 0,
header = 0,
skiprows = range(1,4),
infer_datetime_format = True,
cache_dates=True).replace(-9999.0, np.nan)
Let's say that I want to apply a function to all the element of the dataframe dfrDATA.
The first idea could be to set-up a double cycle with iloc but this will kill pandas advantages and i suppose the code performances.
Therefore, I come up with this:
def func_each_column(x,dfr):
"""
here apply again for each row
"""
res = 1
return res
res = dfrDATA.apply(func_each_column,args=(dfrDATA))
However, I have this error:
The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In addition, I would like to know if there is a better way to do what I want.
Thanks

Explanation of pandas DataFrame.assign() behaviour using lambda

import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
def get_C1(row):
return row.A + row.B
def get_C2(row):
return 'X' if row.A + row.B == 0 else 'Y'
def get_C3(row):
is_zero = row.A + row.B
return "X" if is_zero else "Y"
df = df.assign(C = lambda row: get_C3(row))
Why the get_C2 and get_C3 functions return an error?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You're thinking that df.assign, when passed a function, behaves like df.apply with axis=1, which calls the function for each row.
That's incorrect.
Per the docs for df.assign
Where the value is a callable, evaluated on df
That means that the function you pass to assign is called on the whole dataframe instead of each individual row.
So, in your function get_C3, the row parameter is not a row at all. It's a whole dataframe (and should be renamed to df or something else) and so row.A and row.B are two whole columns, rather than single cell values.
Thus, is_zero is a whole column as well, and ... if is_zero ... will not work.

iterating pandas rows using .apply()

I wanted to iterate through the pandas data frame but for some reason it does not work with .apply() method.
train = pd.read_csv('../kaggletrain')
pclass = train['Pclass']
# pclass has list of data with either 1, 2 or 3..
# so wanted to return if the cell is 1 then return True or everything False
def abc(pclass):
if pclass == 1:
return True
else:
return False
ABCDEFG = train.apply(abc, axis=1)
This gives valueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thank you for your help

ABCDEFG = train[train['pclass']==1]

how to compare the values of two columns using condition, and assign a value when that condition is met

I want to compare the home_score and away_score column values and if homescore<awayscore assigning homeloss , if homescore>awayscore assigning homewin and if homescore = awayscore assingning draw in new columns
era1800_1900 = era(eras,1800,1900)
era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]
I expect another column result in my data frame with values homeloss, homewin and draw based on the condition scores but i get this error when i used the following code
--era1800_1900 = era(eras,1800,1900)
era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-58ef8c4a0715> in <module>
1 era1800_1900 = era(eras,1800,1900)
----> 2 era1800_1900["result"] = era1800_1900[(era1800_1900["home_score"] < era1800_1900["away_score"] == "Lose")]
~\Anaconda3 new\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Try the following approach:
era['result'] = None
era.loc[era[era['A'] < era['B']].index.values,'result'] = 'homelose'
era.loc[era[era['A'] > era['B']].index.values,'result'] = 'homewin'
era.loc[era[era['A'] < era['B']].index.values,'result'] = 'homedraw'
If you are comfortable with functions, look at this example

Apply functions to multiple columns with pandas

I have 2 functions like this one:
def wind_index(result):
if result > 10:
return 1
elif (result > 0) & (result <= 5):
return 1.5
elif (result > 5) & (result <= 10):
return 2
def get_thermal_index(temp, hum):
return wind_index(temp - 0.4*(temp-10)*((1-hum)/100))
When I'm trying to apply this function like this:
df['tci'] = get_thermal_index(df['tempC'], df['humidity'])
I got this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What else can I do to get a new column for my DataFrame using those functions??

You can use Series.apply:
def get_thermal_index(temp, hum):
return (temp - 0.4*(temp-10)*((1-hum)/100)).apply(wind_index)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculate Average True Range directly with Dataframe - pandas

Related

apply a funtion to all element in a dataframe by considering all values of ement row

Explanation of pandas DataFrame.assign() behaviour using lambda

iterating pandas rows using .apply()

how to compare the values of two columns using condition, and assign a value when that condition is met

Apply functions to multiple columns with pandas

Categories

Resources