How to adjust the cumulative period and weigh in python code(effective drought index(EDI:BYUN, WILHITE 1999) - numpy

I'm make a code, but big problem,, It takes too long to calculate.
In short, the definition of this EDI(Byun and wilhite 1999)
EDI is accumulate and weighting precipitation index
pr = precipitation, accumulate days : 365 days(1 years) after accumulate days is redifined.
Big problem is redifined code..
my code is belows
Main code: return_redifine_EP_MEP_DS
Sub code : return_EP_v2, return_MEP_v2 , return_DEP_v2
Additional explanation
im using the python iris and numpy etc.,
EP_cube.data or MEP_cube.data . ".data" is arrays
Data sets. CMIP5 models precipitation(time, latitude, longitude)
origin_pr is precipitation and shape (time, latitude, longitude): 1971~2000
EP_cube is accumulate precipitation and (time, latitude, longitude): 1972~2000
MEP_cube is clim mean(each calendar day)of EP_cube(time, latitude, longitude): 365 days
DEP_cube is EP_cube minus clim mean EP_cube(MEP) , (time, latitude, longitud): 1972~2000
sy,ey is climatology years
day is each models 1years day(example: HadGEM2-AO is 360days, ACCESS1-0: 365 days)
def return_redifine_EP_MEP_DS(origin_pr,EP_cube,DEP_cube,sy,ey,day):
origin_DS = day
origin_day= day
DS_cube = EP_cube - EP_cube.data + day
DS = origin_DS
o_DEP_cube =DEP_cube.copy()
return_time = 0
for j in range(0,DEP_cube.data.shape[1]):
for k in range(0,DEP_cube.data.shape[2]):
for i in range(1,DEP_cube.data.shape[0]):
if DEP_cube.data[i,j,k] <0 and DEP_cube.data[i-1,j,k] < 0:
DS = DS + 1
else:
DS = origin_DS
if DS != origin_DS:
EP_cube.data[i,j,k] = return_EP_v2(origin_pr[i-DS+origin_day:i+origin_day,j,k], DS)
day_of_year = DEP_cube[i,j,k].coord('day_of_year').points
MEP_cube.data[day_of_year-1,j,k] = return_MEP_v2(EP_cube[:,j,k], sy, ey,day_of_year)
DEP_cube.data[i,j,k] = return_DEP_v2(EP_cube[i,j,k], MEP_cube[day_of_year-1,j,k])
return EP_cube, MEP_cube, DEP_cube, DS_cube
and sub code below
def return_EP_v2(origin_pr,DS):
weights = np.arange(DS,0,-1) ## 1/1, 1/2, 1/3 ....
EP_data = np.sum(origin_pr.data / weights)
return EP_data
def return_MEP_v2(EP_cube,sy,ey,day_of_year):
### Below mean is extract years and want days(julian day)
ex_EP_cube = EP_cube.extract(iris.Constraint(year = lambda t: sy<=t<=ey, day_of_year = day_of_year ))
### Below mean is clim mean of EP_cube
MEP_data = ex_EP_cube.collapsed('day_of_year',iris.analysis.MEAN).data
return MEP_data
def return_DEP_v2(EP_cube, MEP_cube):
DEP_data = EP_cube.data - MEP_cube.data
return DEP_data

Related

Metrics Calculation for Multilabel Classification Problem

I need to learn how this function works for multilabel problems.
I try to calculate accuracy for to reach same result but i couldnt.
How does it work?
4 labels in this dataset, y_array is real, y_pred is predicted array.
y is like this;
[0,1,1,1], [1,0,0,0] ...
tp = 0
tn = 0
fn = 0
fp = 0
for i in range(len(y_array)):
for j in range(4) :
#True
if ( y_array[i][j] == 1 ) and (y_pred[i][j] == 1 ) :
tp = tp + 1
elif ( y_array[i][j] == 0 ) and (y_pred[i][j] == 0 ) :
tn = tn + 1
#False
elif ( y_array[i][j] == 0 ) and (y_pred[i][j] == 1 ) :
fn = fn + 1
elif ( y_array[i][j] == 1 ) and (y_pred[i][j] == 0 ) :
fp = fp + 1
ac = (tp+tn)/(tp+tn+fp+fn)
print("Accuracy", ac)
print('Accuracy: {0}'.format(accuracy_score(y_array, y_pred)))
They are different from each other, How can i calculate accuracy or other metrics for this multilabel problem?
Is it wrong to use sklearn accuracy metric?
Accuracy 0.9068711367973193
Accuracy: 0.7134998676125521
As per scikit-learn documentation for accuracy_score:
for multilabel classification, this function computes subset accuracy:
the set of labels predicted for a sample must exactly match the
corresponding set of labels in y_true.
This means that each label will look something like [0,0,1,0] and will need identical match for a single Positive (so y_pred will need to be [0,0,1,0] as well), and anything that isn't [0,0,1,0] will result in a single Negative.
In your manual function, you count each partial match separately:
if y_true is [0,0,1,0] and y_pred is [0,1,0,0], you count this as 2 True Negatives (in position 0 and 3), 1 False Positive (position 1) and 1 False Negative (position 2). With the formula you use for accuracy, this results in ac = (0+2)/(0+2+1+1), which gives 50% accuracy, while sklearn.metrics.accuracy_score will be 0%.
If you want to replicate scikit-learn accuracy_score manually, you would need to first check each member of y_array[i], and only then label it as one of the TP,TN,FP,FN.
However seeing as you're dealign with multilabel classification, as per link above, you might want to check out sklearn.metrics.jaccard_score, sklearn.metrics.hamming_loss or sklearn.metrics.zero_one_loss

How to define prob_threshold to avoid double counting during object detection?

I am developing an object detection application using SSD model and I have defined the bounding box and the prob_threshold, when I run the code I realise that the model double count person in frame. Please see below my code
## Setting Pro_threshold for person detection filtering
try:
prob_threshold = float(os.environ['PROB_THRESHOLD'])
except:
prob_threshold = 0.4
def draw_boxes(frame, result, width, height):
"""
:Draws bounding box when person is detected on video frame
:and the probability is more than the specified threshold
"""
present_count = 0
for obj in result[0][0]:
conf = obj[2]
if conf >= prob_threshold:
xmin = int(obj[3] * width)
ymin = int(obj[4] * height)
xmax = int(obj[5] * width)
ymax = int(obj[6] * height)
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 255, 0), 3)
present_count += 1
return frame, present_count
In order to ensure that the number of people in the video frame was not double counted I first initialise the variables and used if statement to calculate the duration spent by each person in the video frame.
## Initialise variables##
present_request_id = 0
present_count = 0
start_time = 0
last_count = 0
total_count = 0
## Calculating the duration a person spent on video#
if present_count < last_count and int(time.time() - start_time) >=1:
duration = int(time.time() - start_time)
if duration > 0:
# Publish messages to the MQTT server
client.publish("person/duration",
json.dumps({"duration": duration + lagtime}))
else:
lagtime += 1
log.warning(lagtime)
adding below argument and experimenting between the seconds, in my case I experimented between 1secs and 3sec
int(time.time() - start_time) >=1
see GitHub Repo for explanation.

dataframe results changes to zero after adding return

I am trying to pass "buy_list" in the code below to df . This is a small section of the code when the full code is executed I get the results of a back test results linked image.
initial results
replacement_stocks = portfolio_size - len(kept_positions)
buy_list = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]
new_portfolio = pd.concat(
(buy_list,
ranking_table.loc[ranking_table.index.isin(kept_positions)])
)
When I define df as in below I get df not defined error
replacement_stocks = portfolio_size - len(kept_positions)
buy_list = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]
new_portfolio = pd.concat(
(buy_list,
ranking_table.loc[ranking_table.index.isin(kept_positions)])
)
df1 = buy_list # ceate df1 with buy_list
df2 = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]# create df2 with buy_list
I tried the solution in the link below
Similar error with suggested fix
Following this I still get df not defined error and the output of my back test changes to 0% in all the month which previously had actual % change negative and positive
replacement_stocks = portfolio_size - len(kept_positions)
buy_list = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]
new_portfolio = pd.concat(
(buy_list,
ranking_table.loc[ranking_table.index.isin(kept_positions)])
)
return buy_list
df2 = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]
print(df2)
This is what I now end up with
Error message
I'd appreciate any suggestions on how I can fix this.
Thanks,
Last1
Below is the full code as requested, it's from a book am working through, Trading Evolved by Andreas Clenow.
Thanks again.
%matplotlib inline
import zipline
from zipline.api import order_target_percent, symbol, \
set_commission, set_slippage, schedule_function, \
date_rules, time_rules, attach_pipeline, pipeline_output
from pandas import Timestamp
import matplotlib.pyplot as plt
import pyfolio as pf
import pandas as pd
import numpy as np
from scipy import stats
from zipline.finance.commission import PerDollar
from zipline.finance.slippage import VolumeShareSlippage, FixedSlippage
from zipline_norgatedata.pipelines import NorgateDataIndexConstituent
from zipline.pipeline import Pipeline
"""
Model Settings
"""
intial_portfolio = 100000
momentum_window1 = 125
momentum_window2 = 125
minimum_momentum = 40
portfolio_size = 30
vola_window = 20
# Trend filter settings
enable_trend_filter = True
trend_filter_symbol = '$SPXTR'
trend_filter_window = 200
"""
Commission and Slippage Settings
"""
enable_commission = True
commission_pct = 0.001
enable_slippage = True
slippage_volume_limit = 0.025
slippage_impact = 0.05
"""
Helper functions.
"""
def momentum_score(ts):
"""
Input: Price time series.
Output: Annualized exponential regression slope,
multiplied by the R2
"""
# Make a list of consecutive numbers
x = np.arange(len(ts))
# Get logs
log_ts = np.log(ts)
# Calculate regression values
slope, intercept, r_value, p_value, std_err = stats.linregress(x, log_ts)
# Annualize percent
annualized_slope = (np.power(np.exp(slope), 252) - 1) * 100
#Adjust for fitness
score = annualized_slope * (r_value ** 2)
return score
def volatility(ts):
return ts.pct_change().rolling(vola_window).std().iloc[-1]
"""
Initialization and trading logic
"""
def make_pipeline():
indexconstituent = NorgateDataIndexConstituent('$SPX')
return Pipeline(
columns={
'NorgateDataIndexConstituent':indexconstituent},
screen = indexconstituent)
def initialize(context):
attach_pipeline(make_pipeline(), 'norgatedata_pipeline', chunks=9999,eager=True)
# Set commission and slippage.
if enable_commission:
comm_model = PerDollar(cost=commission_pct)
else:
comm_model = PerDollar(cost=0.0)
set_commission(comm_model)
if enable_slippage:
slippage_model=VolumeShareSlippage(volume_limit=slippage_volume_limit, price_impact=slippage_impact)
set_slippage(slippage_model)
else:
slippage_model=FixedSlippage(spread=0.0)
# Used only for progress output.
context.last_month = intial_portfolio
# Store index membership
#context.index_members = pd.read_csv('../data/index_members/sp500.csv', index_col=0, parse_dates=[0])
#Schedule rebalance monthly.
schedule_function(
func=rebalance,
date_rule=date_rules.month_start(),
time_rule=time_rules.market_open()
)
def output_progress(context):
"""
Output some performance numbers during backtest run
"""
# Get today's date
today = zipline.api.get_datetime().date()
# Calculate percent difference since last month
perf_pct = (context.portfolio.portfolio_value / context.last_month) - 1
# Print performance, format as percent with two decimals.
print("{} - Last Month Result: {:.2%}".format(today, perf_pct))
# Remember today's portfolio value for next month's calculation
context.last_month = context.portfolio.portfolio_value
def rebalance(context, data):
# Write some progress output during the backtest
output_progress(context)
context.pipeline_data = pipeline_output('norgatedata_pipeline')
todays_universe = context.pipeline_data.index
# Check how long history window we need.
hist_window = max(momentum_window1,
momentum_window2)
# Get historical data
hist = data.history(todays_universe, "close", hist_window, "1d")
# Slice the history to match the two chosen time frames.
momentum_hist1 = hist[(-1 * momentum_window1):]
momentum_hist2 = hist[(-1 * momentum_window2):]
# Calculate momentum values for the two time frames.
momentum_list1 = momentum_hist1.apply(momentum_score)
momentum_list2 = momentum_hist2.apply(momentum_score)
# Now let's put the two momentum values together, and calculate mean.
momentum_concat = pd.concat((momentum_list1, momentum_list2))
mom_by_row = momentum_concat.groupby(momentum_concat.index)
mom_means = mom_by_row.mean()
# Sort by momentum value.
ranking_table = mom_means.sort_values(ascending=False)
"""
Sell Logic
First we check if any existing position should be sold.
* Sell if stock is no longer part of index.
* Sell if stock has too low momentum value.
"""
kept_positions = list(context.portfolio.positions.keys())
for security in context.portfolio.positions:
if (security not in todays_universe):
order_target_percent(security, 0.0)
kept_positions.remove(security)
elif ranking_table[security] < minimum_momentum:
order_target_percent(security, 0.0)
kept_positions.remove(security)
"""
Trend Filter Section
"""
if enable_trend_filter:
ind_hist = data.history(
symbol(trend_filter_symbol),
'close',
trend_filter_window,
'1d'
)
trend_filter = ind_hist.iloc[-1] > ind_hist.mean()
if trend_filter == False:
return
"""
Stock Selection Logic
Check how many stocks we are keeping from last month.
Fill from top of ranking list, until we reach the
desired total number of portfolio holdings.
"""
replacement_stocks = portfolio_size - len(kept_positions)
buy_list = ranking_table.loc[
~ranking_table.index.isin(kept_positions)][:replacement_stocks]
new_portfolio = pd.concat(
(buy_list,
ranking_table.loc[ranking_table.index.isin(kept_positions)])
)
"""
Calculate inverse volatility for stocks,
and make target position weights.
"""
vola_table = hist[new_portfolio.index].apply(volatility)
inv_vola_table = 1 / vola_table
sum_inv_vola = np.sum(inv_vola_table)
vola_target_weights = inv_vola_table / sum_inv_vola
for security, rank in new_portfolio.iteritems():
weight = vola_target_weights[security]
if security in kept_positions:
order_target_percent(security, weight)
else:
if ranking_table[security] > minimum_momentum:
order_target_percent(security, weight)
def analyze(context, perf):
perf['max'] = perf.portfolio_value.cummax()
perf['dd'] = (perf.portfolio_value / perf['max']) - 1
maxdd = perf['dd'].min()
ann_ret = (np.power((perf.portfolio_value.iloc[-1] / perf.portfolio_value.iloc[0]),(252 / len(perf)))) - 1
print("Annualized Return: {:.2%} Max Drawdown: {:.2%}".format(ann_ret, maxdd))
return
start_date = Timestamp('2015-01-01',tz='UTC')
end_date = Timestamp('2020-03-14',tz='UTC')
perf = zipline.run_algorithm(
start=start_date, end=end_date,
initialize=initialize,
analyze=analyze,
capital_base=intial_portfolio,
data_frequency = 'daily',
bundle='norgatedata-sp500' )

Get phase of day based on solar altitude/azimuth

As per the skyfield documentation online, I am able to calculate whether a given time of day is day or night.
import skyfield.api
import skyfield.almanac
ts = skyfield.api.load.timescale()
ephemeris = skyfield.api.load('de421.bsp')
observer = skyfield.api.Topos(latitude_degrees=LAT, longitude_degrees=LON)
is_day_or_night = skyfield.almanac.sunrise_sunset(ephemeris, observer)
day_or_night = is_day_or_night(ts.utc(merged_df.index.to_pydatetime()))
s = pd.Series(data=['Day' if is_day else 'Night' for is_day in day_or_night],
index = merged_df.index, dtype='category')
merged_df['Day or Night'] = s
Now, I want to also categorize the morning/noon/evening phases of the day according to solar altitude/azimuth. I came up with the following.
earth, sun = ephemeris['earth'], ephemeris['sun']
observer = earth + skyfield.api.Topos(latitude_degrees=LAT,
longitude_degrees=LON)
astrometric = observer.at(ts.utc(merged_df.index.to_pydatetime())).observe(sun)
alt, az, d = astrometric.apparent().altaz()
I need help in understanding how to proceed further since I don't have the related background knowledge about astronomical calculations.
Thanks
As per Brandon's comment, I used the cosine of zenith angle to get the sunlight intensity and then hacked together the sunlight intensity with the zenith angle to form a monotonically increasing function for the duration of the day.
temp = pd.DataFrame({'zenith': 90 - alt.degrees,
'Day or Night': day_or_night},
index=merged_df.index)
temp['cos zenith'] = np.cos(np.deg2rad(temp['zenith']))
temp['feature'] = temp['cos zenith'].diff(periods=-1)*temp['zenith']
temp['Day Phase'] = None
temp.loc[temp['Day or Night'] == False, 'Day Phase'] = 'Night'
temp.loc[(temp['feature'] > -np.inf) & (temp['feature'] <= -0.035), 'Day Phase'] = 'Morning'
temp.loc[(temp['feature'] > -0.035) & (temp['feature'] <= 0.035), 'Day Phase'] = 'Noon'
temp.loc[(temp['feature'] > 0.035) & (temp['feature'] <= np.inf), 'Day Phase'] = 'Evening'
merged_df['Phase of Day'] = temp['Day Phase']
The limits can be adjusted to change the duration required for noon, etc.

matplotlib x-axis ticks dates formatting and locations

I've tried to duplicate plotted graphs originally created with flotr2 for pdf output with matplotlib. I must say that flotr is way easyer to use... but that aside - im currently stuck at trying to format the dates /times on x-axis to desired format, which is hours:minutes with interval of every 2 hours, if period on x-axis is less than one day and year-month-day format if period is longer than 1 day with interval of one day.
I've read through numerous examples and tried to copy them, but outcome remains the same which is hours:minutes:seconds with 1 to 3 hour interval based on how long is the period.
My code:
colorMap = {
'speed': '#3388ff',
'fuel': '#ffaa33',
'din1': '#3bb200',
'din2': '#ff3333',
'satellites': '#bfbfff'
}
otherColors = ['#00A8F0','#C0D800','#CB4B4B','#4DA74D','#9440ED','#800080','#737CA1','#E4317F','#7D0541','#4EE2EC','#6698FF','#437C17','#7FE817','#FBB117']
plotMap = {}
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure(figsize=(22, 5), dpi = 300, edgecolor='k')
ax1 = fig.add_subplot(111)
realdata = data['data']
keys = realdata.keys()
if 'speed' in keys:
speed_index = keys.index('speed')
keys.pop(speed_index)
keys.insert(0, 'speed')
i = 0
for key in keys:
if key not in colorMap.keys():
color = otherColors[i]
otherColors.pop(i)
colorMap[key] = color
i += 1
label = u'%s' % realdata[keys[0]]['name']
ax1.set_ylabel(label)
plotMap[keys[0]] = {}
plotMap[keys[0]]['label'] = label
first_dates = [ r[0] for r in realdata[keys[0]]['data']]
date_range = first_dates[-1] - first_dates[0]
ax1.xaxis.reset_ticks()
if date_range > datetime.timedelta(days = 1):
ax1.xaxis.set_major_locator(dates.WeekdayLocator(byweekday = 1, interval=1))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
ax1.xaxis.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%H:%M'))
ax1.xaxis.grid(True)
plotMap[keys[0]]['plot'] = ax1.plot_date(
dates.date2num(first_dates),
[r[1] for r in realdata[keys[0]]['data']], colorMap[keys[0]], xdate=True)
if len(keys) > 1:
first = True
for key in keys[1:]:
if first:
ax2 = ax1.twinx()
ax2.set_ylabel(u'%s' % realdata[key]['name'])
first = False
plotMap[key] = {}
plotMap[key]['label'] = u'%s' % realdata[key]['name']
plotMap[key]['plot'] = ax2.plot_date(
dates.date2num([ r[0] for r in realdata[key]['data']]),
[r[1] for r in realdata[key]['data']], colorMap[key], xdate=True)
plt.legend([value['plot'] for key, value in plotMap.iteritems()], [value['label'] for key, value in plotMap.iteritems()], loc = 2)
plt.savefig(path +"node.png", dpi = 300, bbox_inches='tight')
could someone point out why im not getting desired results, please?
Edit1:
I moved the formatting block after the plotting and seem to be getting better results now. They are still now desired results though. If period is less than day then i get ticks after every 2 hours (interval=2), but i wish i could get those ticks at even hours not uneven hours. Is that possible?
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Edit2:
This seemed to give me what i wanted:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Alan
You are making this way harder on your self than you need to. matplotlib can directly plot against datetime objects. I suspect your problem is you are setting up the locators, then plotting, and the plotting is replacing your locators/formatters with the default auto versions. Try moving that block of logic about the locators to below the plotting loop.
I think that this could replace a fair chunk of your code:
d = datetime.timedelta(minutes=2)
now = datetime.datetime.now()
times = [now + d * j for j in range(500)]
ax = plt.gca() # get the current axes
ax.plot(times, range(500))
xax = ax.get_xaxis() # get the x-axis
adf = xax.get_major_formatter() # the the auto-formatter
adf.scaled[1./24] = '%H:%M' # set the < 1d scale to H:M
adf.scaled[1.0] = '%Y-%m-%d' # set the > 1d < 1m scale to Y-m-d
adf.scaled[30.] = '%Y-%m' # set the > 1m < 1Y scale to Y-m
adf.scaled[365.] = '%Y' # set the > 1y scale to Y
plt.draw()
doc for AutoDateFormatter
I achieved what i wanted by doing this:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))