How to get the lowest low of a series in PineScript - min

I'm trying to get the lowest low of a series of candles after a condition, but it always returns the last candle of the condition. I try with min(), lowest() and a for loop but it doesn't work. Also try using blackCandle[] and min(ThreeinARow)/lowest(ThreeinARow) and sometimes it returns the last candle and other times it gives me compilation error.
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
SL = ThreeinARow ? min(low[1], low[2], low[3]) : na

//#version=4
study("Help (low after 3DownBar)", overlay=true, max_bars_back=100)
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
bar_ind = barssince(ThreeinARow)
//SL = lowest(max(1, nz(bar_ind))) // the lowest low of a series of candles after the condition
SL = lowest(max(1, nz(bar_ind)+1)) // the lowest low of a series of candles since the condition
plot(SL, style=plot.style_cross, linewidth=3)
bgcolor(ThreeinARow ? color.silver : na)
See also the second solution which is in the commented line

It seems that I was misinterpreting it. Using min() does return the minimum of a series of candles. The detail is that I must enter the specific number of candles that I will use to calculate the minimum, which, for now, does not generate any problem for me. In the end, this is how I ended up writing it:
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
Lowest_Low = if ThreeinARow
min(low[1], low[2], low[3])
plot(Lowest_Low, color=color.red)

Related

Pandas rolling window on an offset between 4 and 2 weeks in the past

I have a datafile with quality scores from different suppliers over a time range of 3 years. The end goal is to use machine learning to predict the quality label (good or bad) of a shipment based on supplier information.
I want to use the mean historic quality data over a specific period of time as an input feature in this model by using pandas rolling window. the problem with this method is that pandas only allows you to create a window from t=0-x until t=0 for you rolling window as presented below:
df['average_score t-2w'] = df['score'].rolling(window='14d',closed='left').mean()
And this is were the problem comes in. For my feature I want to use quality data from a period of 2 weeks, but these 2 weeks are not the 2 weeks before the corresponding shipment, but of 2 weeks, starting from t=-4weeks , and ending on t=-2weeks.
You would imagine that this could be solved by using the same string of code but changing the window as presented below:
df['average_score t-2w'] = df['score'].rolling(window='28d' - '14d',closed='left').mean()
This, or any other type of denotation of this specific window does not seem to work.
It seems like pandas does not offer a solution to this problem, so we made a work around it with the following solution:
def time_shift_week(df):
def _avg_score_interval_func(series):
current_time = series.index[-1]
result = series[(series.index > ( current_time- pd.Timedelta(value=4, unit='w')))
& (series.index < (current_time - pd.Timedelta(value=2, unit='w')))]
return result.mean() if len(result)>0 else 0.0
temp_df = df.groupby(by=["supplier", "timestamp"], as_index=False).aggregate({"score": np.mean}).set_index('timestamp')
temp_df["w-42"] = (
temp_df
.groupby(["supplier"])
.ag_score
.apply(lambda x:
x
.rolling(window='30D', closed='both')
.apply(_avg_score_interval_func)
))
return temp_df.reset_index()
This results in a new df in which we find the average score score per supplier per timestamp, which we can subsequently merge with the original data frame to obtain the new feature.
Doing it this way seems really cumbersome and overly complicated for the task I am trying to perform. Eventhough we have found a workaround, I am wondering if there is an easier method of doing this.
Is anyone aware of a less complicated way of performing this rolling window feature extraction?
While pandas does not have the custom date offset you need, calculating the mean is pretty simple: it's just sum divided by count. You can subtract the 14-day rolling window from the 28-day rolling window:
# Some sample data. All scores are sequential for easy verification
idx = pd.MultiIndex.from_product(
[list("ABC"), pd.date_range("2020-01-01", "2022-12-31")],
names=["supplier", "timestamp"],
)
df = pd.DataFrame({"score": np.arange(len(idx))}, index=idx).reset_index()
# Now we gonna do rolling avg on score with the custom window.
# closed=left mean the current row will be excluded from the window.
score = df.set_index("timestamp").groupby("supplier")["score"]
r28 = score.rolling("28d", closed="left")
r14 = score.rolling("14d", closed="left")
avg_score = (r28.sum() - r14.sum()) / (r28.count() - r14.count())

Trigger Point of Moving average crossover

I am trying to define the trigger point when wt1(Moving average 1) crosses over wt2(moving average 2) and add it to the column ['side'].
So basically add 1 to side at the moment wt1 crosses above wt2.
This is the current code I am using but doesn't seem to be working.
for i in range(len(df)):
if df.wt1.iloc[i] > df.wt2.iloc[i] and df.wt1.iloc[i-1] < df.wt2.iloc[i-1]:
df.side.iloc[1]
If I do the following:
long_signals = (df.wt1 > df.wt2)
df.loc[long_signals, 'side'] = 1
it return the value of 1 the entire time wt1 is above wt2, which is not what i am trying to do.
Expected outcome is when wt1 crosses above wt2 side should be labeled as 1.
Help would be appreciated!
Use shift in your condition:
long_signals = (df.wt1 > df.wt2) & (df.wt1.shift() <= df.wt2.shift())
df.loc[long_signals, 'side'] = 1
df
if you do not like NaNs in 'side', use df.fillna(0) at the end
Your first piece of code also works with the following small modification
for i in range(len(df)):
if df.wt1.iloc[i] > df.wt2.iloc[i] and df.wt1.iloc[i-1] <= df.wt2.iloc[i-1]:
df.loc[i,'side'] = 1

How can I optimize my for loop in order to be able to run it on a 320000 lines DataFrame table?

I think I have a problem with time calculation.
I want to run this code on a DataFrame of 320 000 lines, 6 columns:
index_data = data["clubid"].index.tolist()
for i in index_data:
for j in index_data:
if data["clubid"][i] == data["clubid"][j]:
if data["win_bool"][i] == 1:
if (data["startdate"][i] >= data["startdate"][j]) & (
data["win_bool"][j] == 1
):
NW_tot[i] += 1
else:
if (data["startdate"][i] >= data["startdate"][j]) & (
data["win_bool"][j] == 0
):
NL_tot[i] += 1
The objective is to determine the number of wins and the number of losses from a given match taking into account the previous match, this for every clubid.
The problem is, I don't get an error, but I never obtain any results either.
When I tried with a smaller DataFrame ( data[0:1000] ) I got a result in 13 seconds. This is why I think it's a time calculation problem.
I also tried to first use a groupby("clubid"), then do my for loop into every group but I drowned myself.
Something else that bothers me, I have at least 2 lines with the exact same date/hour, because I have at least two identical dates for 1 match. Because of this I can't put the date in index.
Could you help me with these issues, please?
As I pointed out in the comment above, I think you can simply sum the vector of win_bool by group. If the dates are sorted this should be equivalent to your loop, correct?
import pandas as pd
dat = pd.DataFrame({
"win_bool":[0,0,1,0,1,1,1,0,1,1,1,1,1,1,0],
"clubid": [1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
"date" : [1,2,1,2,3,4,5,1,2,1,2,3,4,5,6],
"othercol":["a","b","b","b","b","b","b","b","b","b","b","b","b","b","b"]
})
temp = dat[["clubid", "win_bool"]].groupby("clubid")
NW_tot = temp.sum()
NL_tot = temp.count()
NL_tot = NL_tot["win_bool"] - NW_tot["win_bool"]
If you have duplicate dates that inflate the counts, you could first drop duplicates by dates (within groups):
# drop duplicate dates
temp = dat.drop_duplicates(["clubid", "date"])[["clubid", "win_bool"]].groupby("clubid")

Comparing timedelta fields

I am looking at file delivery times and can't work out how to compare two timedelta fields using a for loop if statement.
time_diff is the difference between cob_date and last_update_time
average_diff is based on the average for a particular file
I want to find the delay for each row.
I have been able to produce a column delay using average_diff - time_diff
However, when the average_diff - time_diff < 0 I just want to return delay = 0 as this is not a delay.
I have made a for loop but this isn't working and I don't know why. I'm sure the answer is very simple but I can't get there.
test_pv_import_v2['delay2'] = pd.to_timedelta('0')
for index, row in test_pv_import_v2.iterrows():
if test_pv_import_v2['time_diff'] > test_pv_import_v2['average_diff'] :
test_pv_import_v2['delay2'] = test_pv_import_v2['time_diff'] - test_pv_import_v2['average_diff']
Use Series.where for set 0 Timedelta by condition:
mask = test_pv_import_v2['time_diff'] > test_pv_import_v2['average_diff']
s = (test_pv_import_v2['time_diff'] - test_pv_import_v2['average_diff'])
test_pv_import_v2['delay2'] = s.where(mask, pd.to_timedelta('0'))

Why even though I sliced my original DataFrame and assigned it to another variable, my original DataFrame still changed values?

I am trying to calculate a portfolio's daily total price, by multiplying weights of each asset with the daily price of the assets.
Currently I have a DataFrame tw which is all zeros except for the dates that I want to re-balance, which holds my assets weights. What I would like to do is for each month, populate the zeros with the weights I am trying to re-balance with, till the next re-balancing date, and so on and so forth.
My code:
df_of_weights = tw.loc[dates_to_rebalance[13]:]
temp_date = dates_to_rebalance[13]
counter = 0
for date in df_of_weights.index:
if date.year == temp_date.year and date.month == temp_date.month:
if date.day == temp_date.day:
pass
else:
df_of_weights.loc[date] = df_of_weights.loc[temp_date].values
counter += 1
temp_date = dates_to_rebalance[13+counter]
I understand that if you slice your DataFrame and assign it to a variable (df_of_weights), changing the values of said variable would not affect the original DataFrame. However, the values in tw changed. Have been searching for an answer online for a while now and am really confused.
You should use copy in order to fix the problem such that:
df_of_weights = tw.loc[dates_to_rebalance[13]:].copy()
The problem is slicing provides view instead of copy. The issue is still open.
https://github.com/pandas-dev/pandas/issues/15631