How do I fit a line to this data? - pandas

I've got the following data:
I'm interested in fitting a line on the 'middle bit' (intercept 0). How do I do that? It would be useful to get a figure for the gradient as well.
(FYI These are a list of cash transactions, in and out. The gradient would be the profit or loss).
Here's some of the data:
https://gist.github.com/chrism2671/1081c13b6760878b457a112d2041622f

You can use numpy.polyfit and numpy.poly1d to achieve that:
import matplotlib.pyplot as plt
import numpy as np
# Create data
ls = np.linspace(0, 100)
s = np.random.rand(len(ls))*100 + ls
# Fit the data
z = np.polyfit(ls, s, deg=1)
p = np.poly1d(z)
# Plotting
plt.figure(figsize=(16,4.5))
plt.plot(ls, s,
alpha=.75, label='signal')
plt.plot(ls, p(ls),
linewidth=1, linestyle='--', color='r', label='polyfit')
plt.legend(ncol=2)
Using the data you provided:

Related

polynomial fitting of a signal and plotting the fitted signal

I am trying to use a polynomial expression that would fit my function (signal). I am using numpy.polynomial.polynomial.Polynomial.fit function to fit my function(signal) using the coefficients. Now, after generating the coefficients, I want to put those coefficients back into the polynomial equation - get the corresponding y-values - and plot them on the graph. But I am not getting what I want (orange line) . What am I doing wrong here?
Thanks.
import math
def getYValueFromCoeff(f,coeff_list): # low to high order
y_plot_values=[]
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**i)
item_list.append(item)
y_plot_values.append(sum(item_list))
print(len(y_plot_values))
return y_plot_values
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
test1= poly.fit(x,y,no_of_coef)
coeffs= test1.coef
#print(test1.coef)
coef_y= getYValueFromCoeff(x, test1.coef)
#print(coef_y)
plt.plot(x,y)
plt.plot(x, coef_y)
If you check out the documentation, consider the two properties: poly.domain and poly.window. To avoid numerical issues, the range poly.domain = [x.min(), x.max()] of independent variable (x) that we pass to the fit() is being normalized to poly.window = [-1, 1]. This means the coefficients you get from poly.coef apply to this normalized range. But you can adjust this behaviour (sacrificing numerical stability) accordingly, that is, adjustig the poly.window will make your curves match:
...
test1 = poly.fit(x, y, deg=no_of_coef, window=[x.min(), x.max()])
...
But unless you have a good reason to do that, I'd stick to the default behaviour of fit().
As a side note: Evaluating polynomials or lists of coefficients is already implemented in numpy, e.g. using directly
coef_y = test1(x)
or alternatively using np.polyval.
I always like to see original solutions to problems. I urge you to continue to pursue that as that is the best way to learn how to fit functions programmatically. I also wanted to provide the solution that is much more tailored towards a standard numpy implementation. As for your custom function, you did really well. The only issue is that the coefficients are from high to low order, while you were counting up in powers from 0 to highest power. Simply counting down from highest power to 0, allows your function to give the correct result. Notice how your function overlays perfectly with the numpy polyval.
import numpy as np
import matplotlib.pyplot as plt
def getYValueFromCoeff(f,coeff_list): # low to high order
y_plot_values=[]
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**(len(coeff_list)-i-1))
item_list.append(item)
y_plot_values.append(sum(item_list))
print(len(y_plot_values))
return y_plot_values
no_of_coef = 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
coeffs = np.polyfit(x,y,no_of_coef)
coef_y = np.polyval(coeffs,x)
COEF_Y = getYValueFromCoeff(x,coeffs)
plt.figure()
plt.plot(x,y)
plt.plot(x, coef_y)
plt.plot(x, COEF_Y)
plt.legend(['Original Function', 'Fitted Function', 'Custom Fitting'])
plt.show()
Output
Here's the simple way of doing it if you didn't know that already...
import math
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
test1= poly.fit(x,y,no_of_coef)
plt.plot(x, y, 'r', label='original y')
x = np.linspace(0, 0.01, 1000)
plt.plot(x, test1(x), 'b', label='y_fit')
plt.legend()
plt.show()

For my code I am trying to graph two separate waveforms using matplotlib. My output does not show two clear waveforms. How do I fix this

import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Lab6.csv', dtype='float', delimiter=',', unpack=True)
t = (data[:1])
yt = np.cos(4 * np.pi * t )
y = data[1:2]
plt.figure()
plt.plot(t,yt,t,y,'o')
plt.xlabel('Time, s')
plt.ylabel('Voltage, V')
plt.legend(('Signal 1', 'Signal 1'))
plt.show()
I imported my data from a cvs file. I am looking to have to waveforms with lines between each data point and two separate colors for each wave
Output from code

adding regression line in python using matplotlib

I have a question about drawing a regression line and determining the slope of that line. I am doing research for water heights of inland lakes in Tibet with the help of satellite date. I have the data for one year of one lake in this script.
However I want to determine the annual rise of the lake for as well the reference height as for the total beams. Is there some one that could help me?
This is the link towards the excel file: https://drive.google.com/file/d/12wD2ByQC6ObNCWq_yIhkXiNsV3KfDpit/view?usp=sharing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Graph in chronological order
heights = pd.read_excel ('Qinghai_dates_heights.xlsx')
dates = (heights.loc[:,'Date'])
strong_beams = (heights.loc[:,'Strong total'])
weak_beams = (heights.loc[:,'Weak total'])
total_beams = (heights.loc[:,'Total'])
# setting the reference data from Hydrolabs
reference_dates = (heights.loc[:,'Date.1'])
reference_heights = (heights.loc[:,'Hydrolabs'])
# Set the locator
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
#plt.plot(dates,strong_beams, label='Strong Beams', marker="o")
#plt.plot(dates,weak_beams, label='Weak Beams', marker="o")
plt.plot(dates, total_beams, label='Total Beams', marker="o")
plt.plot(reference_dates, reference_heights, label='Reference height (Hydrolabs)', marker="o")
X = plt.gca().xaxis
X.set_major_locator(locator)
# Specify formatter
X.set_major_formatter(fmt)
plt.xlabel('Date [months]')
plt.ylabel('elevation [m]')
plt.title("Water-Height Qinghai from November 2018 - November 2019 ")
plt.legend()
plt.show()
Does this help ? I usually use sklearn for this.
import numpy as np
from matplotlib import pyplot as plt
from sklearn import linear_model, datasets
Generate a set of data
X = np.linspace(0, 10)
line_X = X[:, np.newaxis]
Y = X + 0.2*np.random.normal(size=50)
Choose your regression model (there are plenty more, depending on your needs)
lr = linear_model.LinearRegression()
Here you really do the fit
lr.fit(line_X, Y)
Here u extract the parameters, since you seems to need it ;)
slope = lr.coef_[0]
intercept = lr.intercept_
And then you plot
plt.plot(X, slope*X + intercept, ls='-', marker=' ')
plt.plot(X, Y)

Exceedance (1-cdf) plot using seaborn and pandas

Assume a dataframe df with a single column (say latency, i.e. a uni-variate sample). The exceedance function is calculated and plotted as follows:
sorted_df = df.sort_values('latency')
samples = len(sorted_df)
exceedance = [1-(x/samples) for x in range(1, samples + 1)]
ax.plot(df['latency'], exceedance, 'o')
Is there a simpler/elegant way to calculate and plot exceedance function of a univariate sample using seaborn (may be distplot)? I recently learnt using seaborn's distplot function, but I can only plot the cdf as follows:
sns.distplot(df['latency'], hist=False, kde_kws={'cumulative':True})
I'm specifically interested in seaborn because I plan to use this function along with Seaborn.FacetGrid to get an exceedance plot for several factors.
Because you asked for a more elegant way, the following saves you two lines of code and is faster.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
def plot_exceedance(data, **kwargs):
df = data.sort_values()
exceedance = 1.-np.arange(1.,len(df) + 1.)/len(df)
plt.plot(sorted_df, exceedance, **kwargs)
g = sns.FacetGrid(df, row='factorA',col='factorB',hue='factorC')
g.map(plot_exceedance, 'latency')
There is no predefined API/paramaters to calculate exceedance. So, I had to use the code listed above. But considering that I was specifically interested in getting an exceedance plot of several factors and that I could use plt.plot along with seaborn.FacetGrid, the following piece of code worked.
def plot_exceedance(data, **kwargs):
sorted_df = data.sort_values()
samples = len(sorted_df)
exceedance = [1-(x/samples) for x in range(1, samples + 1)]
ax=plt.gca()
ax.plot(sorted_df, exceedance, **kwargs)
g = sns.FacetGrid(df, row='factorA',col='factorB',hue='factorC')
g.map(plot_exceedance, 'latency')
where factorA, factorB and factorC are additional columns in df.

matplotlib doesn't show lines

I have a data file exported from octave which include two vectors x and u0 . I want to plot u0 versus x in matplotlib with the following codes
import scipy.io
import matplotlib.pyplot as plt
data = scipy.io.loadmat('myfile.mat')
x = data['x']
u0 = data['u0']
plt.plot(x,u0)
plt.show()
The above codes gives just a blank figure
When I changed the line plt.plot(x,u0) with plt.plot(x,u0,'-bo') I got the following
Why solid line does not appear?
Here is the data myfile.mat
I strongly suspect that your data arrays have a shape of (N, 1) ie [[0], [0], ...] which matplotlib in broadcasting (correctly) to N 1-point lines.
Try:
fig, ax = plt.subplots(1, 1)
ax.plot(x.flatten(), u0.flatten())
plt.show()