I am having a problem with a foor loop that includes dataframes - pandas

I have a dataframe with 8 columnds. If two of those columns satisfy a condition, I have to fill two columns with the product of other two. And after running the algorithm it is not working.
I have tryed to use series, I have tryed to use import warnings
warnings.filterwarnings("ignore") but it is not working
for i in seq:
if dataframefinal['trade'][i] == 1 and dataframefinal['z'][i] > 0:
dataframefinal['CloseAdj2'][i]= dataframefinal['Close2'][i] *
dataframefinal['trancosshort'][i]
dataframefinal['CloseAdj1'][i]= dataframefinal['Close1'][i] *
dataframefinal['trancostlong'][i]
elif dataframefinal['trade'][i] == 1 and dataframefinal['z'][i] < 0:
dataframefinal['CloseAdj2'][i]= dataframefinal['Close1'][i] *
dataframefinal['trancosshort'][i]
dataframefinal['CloseAdj1'][i]= dataframefinal['Close2'][i] *
dataframefinal['trancostlong'][i]
else:
dataframefinal['CloseAdj1'][i]= dataframefinal['Close1'][i]
dataframefinal['CloseAdj2'][i]= dataframefinal['Close2'][i]

You can use vectorized condition function numpy.select() to do this quickly:
import pandas as pd
from numpy.random import randn, randint
n = 10
df_data = pd.DataFrame(dict(trade=randint(0, 2, n),
z=randn(n),
Close1=randn(n),
Close2=randn(n),
trancosshort=randn(n),
trancostlong=randn(n)))
df_data["CloseAdj1"] = 0
df_data["CloseAdj2"] = 0
seq = [1, 3, 5, 7, 9]
df = df_data.loc[seq]
cond1 = df.eval("trade==1 and z > 0")
cond2 = df.eval("trade==2 and z < 0")
df["CloseAdj2"] = np.select([cond1, cond2],
[df.eval("Close2 * trancosshort"),
df.eval("Close1 * trancosshort")], df.Close2)
df["CloseAdj1"] = np.select([cond1, cond2],
[df.eval("Close1 * trancostlong"),
df.eval("Close2 * trancostlong")], df.Close1)
df_data.loc[seq, ["CloseAdj1", "CloseAdj2"]] = df[["CloseAdj1", "CloseAdj2"]]

Related

Dataframe iteration using Numba instead of itertuples() for faster code

My problem
I have three dataframes which I am using itertuples to loop through.
Itertuples worked well for a time however now I am running too many iterations for itertuples to be efficient enough.
I'd like to use vectorisation or perhaps Numba as I have heard that they are both very fast. I've tried to make them work but I can't figure it out
All three dataframes are Open, High, Low, Close candlestick data with a few other columns i.e 'FG_Top'
The dataframes are
dflong - 15 minute candlestick data
dfshort - 5 minute candlestick data
dfshorter - 1 minute candlestick data
Dataframe creation code as requested in comments
import numpy as np
import pandas as pd
idx15m = ['2022-10-29 06:59:59.999', '2022-10-29 07:14:59.999', '2022-10-29 07:29:59.999', '2022-10-29 07:44:59.999',
'2022-10-29 07:59:59.999', '2022-10-29 08:14:59.999', '2022-10-29 08:29:59.999']
opn15m = [19010, 19204, 19283, 19839, 19892, 20000, 20192]
hgh15m = [19230, 19520, 19921, 19909, 20001, 20203, 21065]
low15m = [18782, 19090, 19245, 19809, 19256, 19998, 20016]
cls15m = [19204, 19283, 19839, 19892, 20000, 20192, 20157]
FG_Bottom = [np.nan, np.nan, np.nan, np.nan, np.nan, 19909, np.nan]
FG_Top = [np.nan, np.nan, np.nan, np.nan, np.nan, 19998, np.nan]
dflong = pd.DataFrame({'Open': opn15m, 'High': hgh15m, 'Low': low15m, 'Close': cls15m, 'FG_Bottom': FG_Bottom, 'FG_Top': FG_Top},
index=idx15m)
idx5m = ['2022-10-29 06:59:59.999', '2022-10-29 07:05:59.999', '2022-10-29 07:10:59.999', '2022-10-29 07:15:59.999',
'2022-10-29 07:20:59.999', '2022-10-29 07:25:59.999', '2022-10-29 07:30:59.999']
opn5m = [19012, 19102, 19165, 19747, 19781, 20009, 20082]
hgh5m = [19132, 19423, 19817, 19875, 20014, 20433, 21068]
low5m = [18683, 19093, 19157, 19758, 19362, 19893, 20018]
cls5m = [19102, 19165, 19747, 19781, 20009, 20082, 20154]
price_end5m = [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
dfshort = pd.DataFrame({'Open': opn5m, 'High': hgh5m, 'Low': low5m, 'Close': cls5m, 'price_end': price_end5m},
index=idx5m)
idx1m = ['2022-10-29 06:59:59.999', '2022-10-29 07:01:59.999', '2022-10-29 07:02:59.999', '2022-10-29 07:03:59.999',
'2022-10-29 07:04:59.999', '2022-10-29 07:05:59.999', '2022-10-29 07:06:59.999']
opn1m = [19010, 19104, 19163, 19748, 19783, 20000, 20087]
hgh1m = [19130, 19420, 19811, 19878, 20011, 20434, 21065]
low1m = [18682, 19090, 19154, 19754, 19365, 19899, 20016]
cls1m = [19104, 19163, 19748, 19783, 20000, 20087, 20157]
price_end1m = [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
dfshorter = pd.DataFrame({'Open': opn1m, 'High': hgh1m, 'Low': low1m, 'Close': cls1m, 'price_end': price_end1m},
index=idx1m)
Give 3 DataFrames that a similar to this following DataFrame
Example Dataframe
Open High ... FG_Top FG_Bottom
2022-10-29 06:59:59.999 20687.83 20700.46 ... NaN NaN
2022-10-29 07:14:59.999 20686.82 20695.74 ... NaN NaN
2022-10-29 07:29:59.999 20733.62 20745.30 ... 20733.62 20700.46
2022-10-29 07:44:59.999 20741.42 20762.75 ... NaN NaN
2022-10-29 07:59:59.999 20723.86 20777.00 ... NaN NaN
... ... ... ... ... ...
2022-11-10 02:14:59.999 16140.29 16167.09 ... NaN NaN
2022-11-10 02:29:59.999 16119.99 16195.19 ... NaN NaN
2022-11-10 02:44:59.999 16136.63 16263.15 ... NaN NaN
2022-11-10 02:59:59.999 16238.91 16238.91 ... NaN NaN
2022-11-10 03:14:59.999 16210.23 16499.00 ... NaN NaN
Code explaination:
I have my first dataframe which I loop over with the first loop, then loop again with a second nested loop. I have if statements checking certain conditions on each iteration and if those conditions are met then I make some values on the first dataframe np.nan
One of the conditions checked in the second loop calls a function which contains a third loop and checks for certain conditions in the other 2 dataframes.
# First loop
for fg_candle_idx, row in enumerate(dflong.itertuples()):
top = row.FG_Top
bottom = row.FG_Bottom
fg_candle_time = row.Index
if (pd.notnull(top)):
# Second loop
for future_candle_idx, r in enumerate(dflong.itertuples()):
future_candle_time = r.Index
next_future_candle = future_candle_time + timedelta(minutes=minutes)
future_candle_high = r.High
future_candle_low = r.Low
future_candle_close = r.Close
future_candle_open = r.Open
if future_candle_idx > fg_candle_idx:
div = r.price_end
# Check conditions, call function check_no_divs
if (pd.isnull(check_no_divs(dfshort, future_candle_time, next_future_candle))) & (
pd.isnull(check_no_divs(dfshorter, future_candle_time, next_future_candle))) & (
pd.isnull(div)):
if future_candle_high < bottom:
continue
elif future_candle_low > top:
continue
elif (future_candle_close < bottom) & \
(future_candle_open > top):
dflong.loc[fg_candle_time, 'FG_Bottom'] = np.nan
dflong.loc[fg_candle_time, 'FG_Top'] = np.nan
continue
# Many additional conditions checked...
The following code is the function check_no_divs
def check_no_divs(df, candle_time, next_candle):
no_divs = []
# Third Loop
for idx, row in enumerate(df.itertuples()):
compare_candle_time = row.Index
div = row.price_end
if (compare_candle_time >= candle_time) & (compare_candle_time <= next_candle):
if pd.notnull(div):
no_divs.append(True)
else:
no_divs.append(False)
elif compare_candle_time < candle_time:
continue
elif compare_candle_time > next_candle:
break
if all(no_divs) == False:
return np.nan
elif any(no_divs) == True:
return 1
Ideal Solution
Clearly using itertuples is far too inefficient for this problem. I think that there would be a much faster solution to this issue using efficient vectorisation or Numba.
Does anyone know how to make this work?
p.s. I'm still quite new to coding, i think my current code could be made more efficient still using itertuples but probably not efficient enough. I'd appreciate it if someone knows a way to greatly increase the speed of this code
I spent a lot of time researching and testing different code and came up with this solution using numba which gives a significant speed boost.
first import the required libraries
import numpy as np
import pandas as pd
from numba import njit, prange
Then define the function using numbas njit decotator
#njit
def filled_fg(fg_top, fg_bottom, dflongindex, Open, High, Low, Close, dflongprice_end,
dfshortprice_end, shortindex, dfshorterprice_end, shorterindex, conflu_top,
conflu_bottom):
# First loop
for i in prange(len(fg_top)):
top = fg_top[i]
bottom = fg_bottom[i]
if top is not np.nan:
if (bottom - top) > 0:
fg_top[i] = np.nan
fg_bottom[i] = np.nan
# Second loop
for j in prange(len(fg_top)):
if j > i:
future_candle_time = dflongindex[j]
next_future_candle = dflongindex[j + 1]
future_candle_high = High[j]
future_candle_low = Low[j]
future_candle_close = Close[j]
future_candle_open = Open[j]
long_div = dflongprice_end[j]
# Check conditions
if ((new_check_no_divs(dfshortprice_end, shortindex, future_candle_time,
next_future_candle)) == np.nan) & ((new_check_no_divs(
dfshorterprice_end, shorterindex, future_candle_time,
next_future_candle)) == np.nan) & (long_div == np.nan):
if future_candle_high < bottom:
continue
elif future_candle_low > top:
continue
# Do something when conditions are met...
elif (future_candle_close < bottom) & \
(future_candle_open > top):
fg_bottom[i] = np.nan
fg_top[i] = np.nan
continue
Define the second function also with numbas njit decorator
#njit
def check_no_divs(div_data, div_candle_time, first_future_candle, second_future_candle):
no_divs = []
for i in prange(len(div_data)):
if (div_candle_time[i] >= first_future_candle) & (div_candle_time[i] <= second_future_candle):
if div_data[i] is not np.nan:
return 1
else:
no_divs.append(0)
elif div_candle_time[i] < first_future_candle:
continue
elif div_candle_time[i] > second_future_candle:
break
div_count = 0
for i in no_divs:
div_count = div_count + i
if div_count == 0:
return np.nan
Before calling the function dataframe indexes need to be reset
dflong = dflong.reset_index()
dfshort = dfshort.reset_index()
dfshorter = dfshorter.reset_index()
Now call the function and use .values to return a numpy representation of the DataFrame.
fg_bottom, fg_top = filled_fg(dflong['FG_Top'].values,
dflong['FG_Bottom'].values,
dflong['index'].values,
dflong['Open'].values,
dflong['High'].values,
dflong['Low'].values,
dflong['Close'].values,
dflong['price_end'].values,
dfshort['price_end'].values,
dfshort['index'].values,
dfshorter['price_end'].values,
dfshorter['index'].values)
Finally the returned data needs to be readded to the original DataFrame dflong
dflong['FG_Bottom'] = fg_bottom
dflong['FG_Top'] = fg_top
Speed test results:
Original itertuples solution = 7.641393423080444 seconds
New Numba solution = 0.5985264778137207 seconds

Unable to Group dataframe by Month number

I have the following code but it seems the line
cs.groupby(cs['Disbursal_Date'].dt.strftime('%B'))['Revenue'].sum()
just returns the entire dataframe without the data grouping by Month number.
Any help is much appreciated
import pandas as pd
import os
import glob
import numpy as np
os.chdir("C:/csv/")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
cs = pd.concat([pd.read_csv(f) for f in all_filenames])
cs.drop(cs.columns[[0]], axis=1, inplace=True)
cs = cs[cs["Booked"] == 1]
cs['Disbursal_Date'] = pd.to_datetime(cs['Disbursal_Date'])
cs.drop_duplicates(inplace=True)
cs['Revenue'] = np.where(cs['Loan_Amount'] < 1000, 28,
np.where((cs['Loan_Amount'] > 1000) & (cs['APR'] < 0.3), 0.0525 * cs['Loan_Amount'],
np.where((cs['Loan_Amount'] > 1000) & (cs['APR'] > 0.3), 0.0275 * cs['Loan_Amount'], 0)))
cs.loc[cs.Revenue >= 175, "Revenue"] = 175
cs.loc[cs.Revenue <= 52.50, "Revenue"] = 52.50
cs.groupby(cs['Disbursal_Date'].dt.strftime('%B'))['Revenue'].sum()
print(cs)
You're not assigning the result from your cs.groupby. Something like:
cs = cs.groupby(cs['Disbursal_Date'].dt.strftime('%B'))['Revenue'].sum()
print(cs)
Should do the trick.

CODING Q based on Dataframes and Series and dictionaries

It would be interesting to see if there is any evidence of a link between vaccine effectiveness and sex of the child. Calculate the ratio of the number of children who contracted chickenpox but were vaccinated against it (at least one varicella dose) versus those who were vaccinated but did not contract chicken pox. Return results by sex.
This function should return a dictionary in the form of (use the correct numbers):
{"male":0.2,
"female":0.4}
Note: To aid in verification, the chickenpox_by_sex()['female'] value the autograder is looking for starts with the digits 0.0077.
PLEASE WRITE A FUNCTIONING CODE FOR THE SAME.
Try the following code:
Read the given dataset using the following code
import pandas as pd
df=pd.read_csv('assets/NISPUF17.csv',index_col=0)
df
Main code
def chickenpox_by_sex():
# YOUR CODE HERE
male_df=df[df['SEX']==1]
vac_m=male_df[male_df['P_NUMVRC']>=1]
cp_m=vac_m[vac_m['HAD_CPOX']==1]
counts_cp_m=cp_m['SEX'].count()
ncp_m=vac_m[vac_m['HAD_CPOX']==2]
counts_ncp_m=ncp_m['SEX'].count()
male=counts_cp_m/counts_ncp_m
female_df=df[df['SEX']==2]
vac_f=female_df[female_df['P_NUMVRC']>=1]
cp_f=vac_f[vac_f['HAD_CPOX']==1]
counts_cp_f=cp_f['SEX'].count()
ncp_f=vac_f[vac_f['HAD_CPOX']==2]
counts_ncp_f=ncp_f['SEX'].count()
female=counts_cp_f/counts_ncp_f
ratio_dict={"male":male,"female":female}
return ratio_dict
raise NotImplementedError()
Check using the following code
chickenpox_by_sex()['female']
Final code to complete this
assert len(chickenpox_by_sex())==2, "Return a dictionary with two items, the first for males and the second for females."
=> [SEX] -> sex=1 (male); sex=2 (female)
=> [HAD_COP] -> contracted chicken pox = 1; not contracted chicken pox = 2
=> [P_NUMVRC]>=1 -> given one or more doses
*ratio(male) = (vaccinated and contracted chicken pox)/(vaccinated and not contracted chicken pox)
*ratio(female) = (vaccinated and contracted chicken pox)/(vaccinated and not contracted chicken pox)
Variable names:
male - male data frame
vac_m - vaccinated male
cp_m - vaccinated and contracted chickenpox (male)
counts_cp_m - counts of vaccinated and contracted chickenpox
ncp_m - vaccinated and not contracted chickenpox (male)
counts_ncp_m - vaccinated and not contracted chickenpox
Similarly for females.
CORRECT SOLUTION.
def chickenpox_by_sex():
import pandas as pd
df = pd.read_csv("NISPUF17.csv")
maleDf = df[df["SEX"] ==1]
doses1 = maleDf[maleDf["P_NUMVRC"] >= 1]
chichkenPox1_1 = doses1[doses1["HAD_CPOX"] == 1]
count1_1 = chichkenPox1_1["SEX"].count()
chichkenPox1_2 = doses1[doses1["HAD_CPOX"] == 2]
count1_2 = chichkenPox1_2["SEX"].count()
resultMale = count1_1/count1_2
femaleDf = df[df["SEX"] == 2]
doses2 = femaleDf[femaleDf["P_NUMVRC"] >= 1]
chichkenPox2_1 = doses2[doses2["HAD_CPOX"] == 1]
count2_1 = chichkenPox2_1["SEX"].count()
chichkenPox2_2 = doses2[doses2["HAD_CPOX"] == 2]
count2_2 = chichkenPox2_2["SEX"].count()
resultFemale = count2_1/count2_2
dict = {"male":resultMale,
"female":resultFemale
}
return dict
The following code works as well:
import pandas as pd
import numpy as np
import math
def chickenpox_by_sex():
df=pd.read_csv('assets/NISPUF17.csv')
c_vaccinated=df[df['P_NUMVRC']>0]
menstats=c_vaccinated[c_vaccinated['SEX']==1]
mnocpox=len(menstats[menstats['HAD_CPOX']==2])
menratio=len(menstats[menstats['HAD_CPOX']==1])/mnocpox
wstats=c_vaccinated[c_vaccinated['SEX']==2]
wnocpox=len(wstats[wstats['HAD_CPOX']==2])
wratio=len(wstats[wstats['HAD_CPOX']==1])/wnocpox
ratios={'male':menratio,'female':wratio}
return ratios
chickenpox_by_sex()
import pandas as pd
def chickenpox_by_sex():
df = pd.read_csv('assets/NISPUF17.csv')
df = df.drop(df[df.HAD_CPOX == 77].index)
df = df.drop(df[df.HAD_CPOX == 99].index)
df = df.dropna(subset=['P_NUMVRC'])
df.loc[df['HAD_CPOX'] == 1, 'HAD_CPOX'] = 'YES'
df.loc[df['HAD_CPOX'] == 2, 'HAD_CPOX'] = 'NO'
df.loc[df['SEX'] == 1, 'SEX'] = 'male'
df.loc[df['SEX'] == 2, 'SEX'] = 'female'
df.loc[df['P_NUMVRC'] == 2.0, 'P_NUMVRC'] = 1
df.loc[df['P_NUMVRC'] == 3.0, 'P_NUMVRC'] = 1
df = df[['SEX', 'P_NUMVRC', 'HAD_CPOX']].round(decimals=0)
dfm = df[df['SEX'] == 'male']
dfmVac = dfm[dfm['P_NUMVRC'] == 1.0]
mPoxVacYes = len(dfmVac[dfmVac['HAD_CPOX'] == 'YES'])
mPoxVacNo = len(dfmVac[dfmVac['HAD_CPOX'] == 'NO'])
dff = df[df['SEX'] == 'female']
dffVac = dff[dff['P_NUMVRC'] == 1.0]
fPoxVacYes = len(dffVac[dffVac['HAD_CPOX'] == 'YES'])
fPoxVacNo = len(dffVac[dffVac['HAD_CPOX'] == 'NO'])
ratioM = mPoxVacYes/float(mPoxVacNo)
ratioF = fPoxVacYes/float(fPoxVacNo)
result = {'male': ratioM * 100, 'female': ratioF * 100}
return result
import pandas as pd
import numpy as np
df = pd.read_csv('assets/NISPUF17.csv', usecols = ['HAD_CPOX', 'SEX', 'P_NUMVRC']).dropna().reset_index()
def chickenpox_by_sex():
girls = df[df.SEX == 2]
girls_had = girls[(girls.HAD_CPOX == 1) & (girls.P_NUMVRC > 0.0)]
girls_not_had = girls[(girls.HAD_CPOX == 2) &(girls.P_NUMVRC > 0.0)]
girls_ratio = len(girls_had)/len(girls_not_had)
boys = df[df.SEX == 1]
boys_had = boys[(boys.HAD_CPOX == 1) & (boys.P_NUMVRC > 0.0)]
boys_not_had = boys[(boys.HAD_CPOX == 2) &(boys.P_NUMVRC > 0.0)]
boys_ratio = len(boys_had)/len(boys_not_had)
result = {"male": round(boys_ratio, ndigits=4),
"female":round(girls_ratio, ndigits = 4)}
return result
chickenpox_by_sex()

Pandas accumulate data for linear regression

I try to adjust my data so total_gross per day is accumulated. E.g.
`Created` `total_gross` `total_gross_accumulated`
Day 1 100 100
Day 2 100 200
Day 3 100 300
Day 4 100 400
Any idea, how I have to change my code to have total_gross_accumulated available?
Here is my data.
my code:
from sklearn import linear_model
def load_event_data():
df = pd.read_csv('sample-data.csv', usecols=['created', 'total_gross'])
df['created'] = pd.to_datetime(df.created)
return df.set_index('created').resample('D').sum().fillna(0)
event_data = load_event_data()
X = event_data.index
y = event_data.total_gross
plt.xticks(rotation=90)
plt.plot(X, y)
plt.show()
List comprehension is the most pythonic way to do this.
SHORT answer:
This should give you the new column that you want:
n = event_data.shape[0]
# skip line 0 and start by accumulating from 1 until the end
total_gross_accumulated =[event_data['total_gross'][:i].sum() for i in range(1,n+1)]
# add the new variable in the initial pandas dataframe
event_data['total_gross_accumulated'] = total_gross_accumulated
OR faster
event_data['total_gross_accumulated'] = event_data['total_gross'].cumsum()
LONG answer:
Full code using your data:
import pandas as pd
def load_event_data():
df = pd.read_csv('sample-data.csv', usecols=['created', 'total_gross'])
df['created'] = pd.to_datetime(df.created)
return df.set_index('created').resample('D').sum().fillna(0)
event_data = load_event_data()
n = event_data.shape[0]
# skip line 0 and start by accumulating from 1 until the end
total_gross_accumulated =[event_data['total_gross'][:i].sum() for i in range(1,n+1)]
# add the new variable in the initial pandas dataframe
event_data['total_gross_accumulated'] = total_gross_accumulated
Results:
event_data.head(6)
# total_gross total_gross_accumulated
#created
#2019-03-01 3481810 3481810
#2019-03-02 4690 3486500
#2019-03-03 0 3486500
#2019-03-04 0 3486500
#2019-03-05 0 3486500
#2019-03-06 0 3486500
X = event_data.index
y = event_data.total_gross_accumulated
plt.xticks(rotation=90)
plt.plot(X, y)
plt.show()

Time Difference between Time Period and Instant

I have some time periods (df_A) and some time instants (df_B):
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
# Data
df_A = pd.DataFrame({'A1': [dt.datetime(2017,1,5,9,8), dt.datetime(2017,1,5,9,9), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,2,7,9,19), dt.datetime(2017,2,7,9,19)],
'A2': [dt.datetime(2017,1,5,9,9), dt.datetime(2017,1,5,9,12), dt.datetime(2017,1,7,9,26), dt.datetime(2017,1,7,9,20), dt.datetime(2017,1,7,9,21), dt.datetime(2017,2,7,9,23), dt.datetime(2017,2,7,9,25)]})
df_B = pd.DataFrame({ 'B': [dt.datetime(2017,1,6,14,45), dt.datetime(2017,1,4,3,31), dt.datetime(2017,1,7,3,31), dt.datetime(2017,1,7,14,57), dt.datetime(2017,1,9,14,57)]})
I can match these together:
# Define an Extra Margin
M = dt.timedelta(days = 10)
df_A["A1X"] = df_A["A1"] + M
df_A["A2X"] = df_A["A2"] - M
# Match
Bv = df_B .B .values
A1 = df_A .A1X.values
A2 = df_A .A2X.values
i, j = np.where((Bv[:, None] >= A1) & (Bv[:, None] <= A2))
df_C = pd.DataFrame(np.column_stack([df_B .values[i], df_A .values[j]]),
columns = df_B .columns .append (df_A.columns))
I would like to find the time difference between each time period and the time instant matched to it. I mean that
if B is between A1 and A2
then dT = 0
I've tried doing it like this:
# Calculate dt
def time(A1,A2,B):
if df_C["B"] < df_C["A1"]:
return df_C["A1"].subtract(df_C["B"])
elif df_C["B"] > df_C["A2"]:
return df_C["B"].subtract(df_C["A2"])
else:
return 0
df_C['dt'] = df_C.apply(time)
I'm getting "ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series"
So, I found two fixes:
You are adding M to the lower value and subtracting from the higher one. Change it to:
df_A['A1X'] = df_A['A1'] - M
df_A['A2X'] = df_A['A2'] + M
You are only passing one row of your dataframe at a time to your time function, so it should be something like:
def time(row):
if row['B'] < row['A1']:
return row['A1'] - row['B']
elif row['B'] > row['A2']:
return row['B'] - row['A2']
else:
return 0
And then you can call it like this:
df_C['dt'] = df_C.apply(time, axis=1) :)