adding regression line in python using matplotlib - matplotlib

I have a question about drawing a regression line and determining the slope of that line. I am doing research for water heights of inland lakes in Tibet with the help of satellite date. I have the data for one year of one lake in this script.
However I want to determine the annual rise of the lake for as well the reference height as for the total beams. Is there some one that could help me?
This is the link towards the excel file: https://drive.google.com/file/d/12wD2ByQC6ObNCWq_yIhkXiNsV3KfDpit/view?usp=sharing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Graph in chronological order
heights = pd.read_excel ('Qinghai_dates_heights.xlsx')
dates = (heights.loc[:,'Date'])
strong_beams = (heights.loc[:,'Strong total'])
weak_beams = (heights.loc[:,'Weak total'])
total_beams = (heights.loc[:,'Total'])
# setting the reference data from Hydrolabs
reference_dates = (heights.loc[:,'Date.1'])
reference_heights = (heights.loc[:,'Hydrolabs'])
# Set the locator
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
#plt.plot(dates,strong_beams, label='Strong Beams', marker="o")
#plt.plot(dates,weak_beams, label='Weak Beams', marker="o")
plt.plot(dates, total_beams, label='Total Beams', marker="o")
plt.plot(reference_dates, reference_heights, label='Reference height (Hydrolabs)', marker="o")
X = plt.gca().xaxis
X.set_major_locator(locator)
# Specify formatter
X.set_major_formatter(fmt)
plt.xlabel('Date [months]')
plt.ylabel('elevation [m]')
plt.title("Water-Height Qinghai from November 2018 - November 2019 ")
plt.legend()
plt.show()

Does this help ? I usually use sklearn for this.
import numpy as np
from matplotlib import pyplot as plt
from sklearn import linear_model, datasets
Generate a set of data
X = np.linspace(0, 10)
line_X = X[:, np.newaxis]
Y = X + 0.2*np.random.normal(size=50)
Choose your regression model (there are plenty more, depending on your needs)
lr = linear_model.LinearRegression()
Here you really do the fit
lr.fit(line_X, Y)
Here u extract the parameters, since you seems to need it ;)
slope = lr.coef_[0]
intercept = lr.intercept_
And then you plot
plt.plot(X, slope*X + intercept, ls='-', marker=' ')
plt.plot(X, Y)

Related

polynomial fitting of a signal and plotting the fitted signal

I am trying to use a polynomial expression that would fit my function (signal). I am using numpy.polynomial.polynomial.Polynomial.fit function to fit my function(signal) using the coefficients. Now, after generating the coefficients, I want to put those coefficients back into the polynomial equation - get the corresponding y-values - and plot them on the graph. But I am not getting what I want (orange line) . What am I doing wrong here?
Thanks.
import math
def getYValueFromCoeff(f,coeff_list): # low to high order
y_plot_values=[]
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**i)
item_list.append(item)
y_plot_values.append(sum(item_list))
print(len(y_plot_values))
return y_plot_values
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
test1= poly.fit(x,y,no_of_coef)
coeffs= test1.coef
#print(test1.coef)
coef_y= getYValueFromCoeff(x, test1.coef)
#print(coef_y)
plt.plot(x,y)
plt.plot(x, coef_y)
If you check out the documentation, consider the two properties: poly.domain and poly.window. To avoid numerical issues, the range poly.domain = [x.min(), x.max()] of independent variable (x) that we pass to the fit() is being normalized to poly.window = [-1, 1]. This means the coefficients you get from poly.coef apply to this normalized range. But you can adjust this behaviour (sacrificing numerical stability) accordingly, that is, adjustig the poly.window will make your curves match:
...
test1 = poly.fit(x, y, deg=no_of_coef, window=[x.min(), x.max()])
...
But unless you have a good reason to do that, I'd stick to the default behaviour of fit().
As a side note: Evaluating polynomials or lists of coefficients is already implemented in numpy, e.g. using directly
coef_y = test1(x)
or alternatively using np.polyval.
I always like to see original solutions to problems. I urge you to continue to pursue that as that is the best way to learn how to fit functions programmatically. I also wanted to provide the solution that is much more tailored towards a standard numpy implementation. As for your custom function, you did really well. The only issue is that the coefficients are from high to low order, while you were counting up in powers from 0 to highest power. Simply counting down from highest power to 0, allows your function to give the correct result. Notice how your function overlays perfectly with the numpy polyval.
import numpy as np
import matplotlib.pyplot as plt
def getYValueFromCoeff(f,coeff_list): # low to high order
y_plot_values=[]
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**(len(coeff_list)-i-1))
item_list.append(item)
y_plot_values.append(sum(item_list))
print(len(y_plot_values))
return y_plot_values
no_of_coef = 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
coeffs = np.polyfit(x,y,no_of_coef)
coef_y = np.polyval(coeffs,x)
COEF_Y = getYValueFromCoeff(x,coeffs)
plt.figure()
plt.plot(x,y)
plt.plot(x, coef_y)
plt.plot(x, COEF_Y)
plt.legend(['Original Function', 'Fitted Function', 'Custom Fitting'])
plt.show()
Output
Here's the simple way of doing it if you didn't know that already...
import math
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
test1= poly.fit(x,y,no_of_coef)
plt.plot(x, y, 'r', label='original y')
x = np.linspace(0, 0.01, 1000)
plt.plot(x, test1(x), 'b', label='y_fit')
plt.legend()
plt.show()

For my code I am trying to graph two separate waveforms using matplotlib. My output does not show two clear waveforms. How do I fix this

import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Lab6.csv', dtype='float', delimiter=',', unpack=True)
t = (data[:1])
yt = np.cos(4 * np.pi * t )
y = data[1:2]
plt.figure()
plt.plot(t,yt,t,y,'o')
plt.xlabel('Time, s')
plt.ylabel('Voltage, V')
plt.legend(('Signal 1', 'Signal 1'))
plt.show()
I imported my data from a cvs file. I am looking to have to waveforms with lines between each data point and two separate colors for each wave
Output from code

Python rolling Sharpe ratio with Pandas or NumPy

I am trying to generate a plot of the 6-month rolling Sharpe ratio using Python with Pandas/NumPy.
My input data is below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
print(df.head(20))
returns
Date
2008-01-01 0.232794
2008-01-02 0.957157
2008-01-03 0.079939
2008-01-04 0.772999
2008-01-05 0.708377
2008-01-06 0.579662
2008-01-07 0.998632
2008-01-08 0.432605
2008-01-09 0.499041
2008-01-10 0.693420
2008-01-11 0.330222
2008-01-12 0.109280
2008-01-13 0.776309
2008-01-14 0.079325
2008-01-15 0.559206
2008-01-16 0.748133
2008-01-17 0.747319
2008-01-18 0.936322
2008-01-19 0.211246
2008-01-20 0.755340
What I want
The type of plot I am trying to produce is this or the first plot from here (see below).
My attempt
Here is the equation I am using:
def my_rolling_sharpe(y):
return np.sqrt(126) * (y.mean() / y.std()) # 21 days per month X 6 months = 126
# Calculate rolling Sharpe ratio
df['rs'] = calc_sharpe_ratio(df['returns'])
fig, ax = plt.subplots(figsize=(10, 3))
df['rs'].plot(style='-', lw=3, color='indianred', label='Sharpe')\
.axhline(y = 0, color = "black", lw = 3)
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
fig.tight_layout()
plt.show()
The problem is that I am getting a horizontal line since my function is giving a single value for the Sharpe ratio. This value is the same for all the Dates. In the example plots, they appear to be showing many ratios.
Question
Is it possible to plot a 6-month rolling Sharpe ratio that changes from one day to the next?
Approximately correct solution using df.rolling and a fixed window size of 180 days:
df['rs'] = df['returns'].rolling('180d').apply(my_rolling_sharpe)
This window isn't exactly 6 calendar months wide because rolling requires a fixed window size, so trying window='6MS' (6 Month Starts) throws a ValueError.
To calculate the Sharpe ratio for a window exactly 6 calendar months wide, I'll copy this super cool answer by SO user Mike:
df['rs2'] = [my_rolling_sharpe(df.loc[d - pd.offsets.DateOffset(months=6):d, 'returns'])
for d in df.index]
# Compare the two windows
df.plot(y=['rs', 'rs2'], linewidth=0.5)
I have prepared an alternative solution to your question, this one is based on using solely the window functions from pandas.
Here I have defined "on the fly" the calculation of the Sharpe Ratio, please consider for your solution the following parameters:
I have used a Risk Free rate of 2%
The dash line is just a Benchmark for the rolling Sharpe Ratio, the value is 1.6
So the code is the following
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
df['rolling_SR'] = df.returns.rolling(180).apply(lambda x: (x.mean() - 0.02) / x.std(), raw = True)
df.fillna(0, inplace = True)
df[df['rolling_SR'] > 0].rolling_SR.plot(style='-', lw=3, color='orange',
label='Sharpe', figsize = (10,7))\
.axhline(y = 1.6, color = "blue", lw = 3,
linestyle = '--')
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
plt.show()
print('---------------------------------------------------------------')
print('In case you want to check the result data\n')
print(df.tail()) # I use tail, beacause of the size of your window.
You should get something similar to this picture

How do I fit a line to this data?

I've got the following data:
I'm interested in fitting a line on the 'middle bit' (intercept 0). How do I do that? It would be useful to get a figure for the gradient as well.
(FYI These are a list of cash transactions, in and out. The gradient would be the profit or loss).
Here's some of the data:
https://gist.github.com/chrism2671/1081c13b6760878b457a112d2041622f
You can use numpy.polyfit and numpy.poly1d to achieve that:
import matplotlib.pyplot as plt
import numpy as np
# Create data
ls = np.linspace(0, 100)
s = np.random.rand(len(ls))*100 + ls
# Fit the data
z = np.polyfit(ls, s, deg=1)
p = np.poly1d(z)
# Plotting
plt.figure(figsize=(16,4.5))
plt.plot(ls, s,
alpha=.75, label='signal')
plt.plot(ls, p(ls),
linewidth=1, linestyle='--', color='r', label='polyfit')
plt.legend(ncol=2)
Using the data you provided:

How to I set up x-axis using year-month as tickmarks in matplotlib?

I have a 2D variable XVAR from a netcdf file, with dimension [year, month]. I want to plot the flattened XVAR (1D array with length nyear*nmonth) and set up the x-axis as this: years on major ticks, and months on minor ticks. The difficulty is that I donot know how to create a 1d array at monthly step. There is no monthdelta method that I could use (though I understand that the reason is because each month has different numbers of days).
In the delta=? step below, I tried delta=relativedelta.relativedelta(months=1), but got an error "object has no attribute 'total_seconds'", which I donot completely understand.
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_toolkits.basemap import Basemap
from datetime import date, timedelta
ncfile = Dataset('filepath',mode='r')
XVAR4d = ncfile.variables['XVAR'][:]
XVAR2d = np.nanmean(XVAR4d,axis=(2,3)).flatten()
yrs = ncfile.variables['YEAR']
stt = date(np.min(yrs),1,1)
end = date(np.max(yrs)+1,1,1)
delta = ?
dates = mdates.drange(stt,end,delta)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(yearsFmt)
ax1.xaxis.set_minor_locator(months)
ax1.set_xlim(stt,end)
ax1.plot(dates,xvar2d,c='r')
I decided I did like the idea of my second comment, so I am turning it into an actual proposed answer.
Instead of using drange, create the dates yourself:
totalMonths = 12*(np.max(yrs) - np.min(yrs)+1)
dates = mdates.date2num([date(np.min(yrs)+(i//12),i%12+1,1) for i in range(totalMonths)])