Slope and its relationship with angles - numpy

I have been using numpy's polyfit function to get a linear fit to some data. And I have gotten some slopes. However the idea of slopes have gotten my head very confused.
I am getting slopes of 0.0142 and of 391! quite different.
What does a slope of 391 actually means? Take a look at this
import numpy as np
import matplotlib.pyplot as plt
xr=np.arange(100)
yr=0.0142*xr
yr2=391*xr
plt.plot(xr,yr,yr2)
print("The angle is:",np.degrees(np.arctan(391)))
The angle is: 89.8534637990051
There is no way that that angle is 89.8 degrees!
What am I getting wrong?

slope = 391 means for every unit change in your x, y changes by 391. The angle is fact 89.8 and you cannot see it in your plot because your x and y axis have different scales. If you set them to a same scale (i.e. same unit length) you will see an angle of 89.8.

Related

Why curve fit function does not combine all data points. How to get best fit?

I'm not familiar that how to decide the fitting function? But by looking at the trend of data points I choosed Poisson distribution as my fitting function. Green curve is quite smooth but fitting curve is is far away from first data point having position (0,0.55). I want to get smooth curve using fitting function because it is far away from my actual data points. I tried to increase number of bins but still getting same type of curve. I have doubt that may be I am not choosing proper fitting function or may be I am missing something else?
`def Poisson_fit(x,a):
return (a*np.exp(-x))
def Poisson(x):
return (np.exp(-x))
x_data =np.linspace(0,5,10)
print("x_data: ",x_data)
[0.,0.55555556, 1.11111111, 1.66666667, 2.22222222, 2.77777778, 3.33333333,
3.88888889, 4.44444444, 5.]
hist, bin_edges= np.histogram(x, bins=10, density=True)
print("hist: ",hist)
#hist:[5.41041394e-01,1.42611032e-01,3.44975130e-02,7.60221121e-03,
1.66115522e-03,3.26808028e-04,6.70741368e-05,1.14168743e-05,5.70843717e-06,
1.42710929e-06]
plt.scatter(x_data, hist,marker='o',color='red')
popt, pcov = optimize.curve_fit(Poisson_fit, x_data, hist)
plt.plot(x_data, Poisson_fit(x_data,*popt), linestyle='--',
marker='.',color='red', label='Fit')
plt.plot(x_data,Poisson(x_data),marker='.',color='green',label='Poisson')`
#Second Graph(Find best fit)
In the following graph I have fit two different distributions on data points. For me its hard to judge which is best fit. Should I print error on the fitting function to judge the best fit?
`perr = np.sqrt(np.diag(pcov))`
If all data-points need to coincide with the interpolating fit, splines (e.g. cubic splines) can be used, generally resulting in a reasonably smooth fit (only generally, because what is "reasonably smooth" depends both on the data and the application).
Example:
import numpy as np
from scipy.interpolate import CubicSpline
import pylab
x_data = np.linspace(0,5,10)
y_data = np.array([5.41041394e-01,1.42611032e-01,3.44975130e-02,
7.60221121e-03,1.66115522e-03,3.26808028e-04,
6.70741368e-05,1.14168743e-05,5.70843717e-06,
1.42710929e-06])
spline = CubicSpline(x_data, y_data)
plot_x = np.linspace(0,5,1000)
pylab.plot(x_data, y_data, 'b*', label='Data')
pylab.plot(plot_x, spline(plot_x), 'k-', label='Spline')
pylab.legend(loc='best')
pylab.show()

Cartopy plot high/low sea level pressure on map

I'm migrating from basemap to cartopy. One thing I would like to do is plot high/low pressure on a map, such as in basemap. There is a good example on this page of how to do this: https://matplotlib.org/basemap/users/examples.html ("Plot sea-level pressure weather map with labelled highs and lows"). I'm not going to copy and paste the code from this site, but would like to know how to do the same in cartopy. The main thing I can't get my head around is how to do m.xmax and x > m.xmin and y < m.ymax and y > m.ymin in cartopy (some kind of vector transform I'd imagine.
I've had a good look and can't see this particular example translated into something compatible with cartopy. Any help would be welcome!
In order to write an equivalent program using cartopy you need to be able to translate two concepts. The first is finding the extent of a projection, this can be done with the get_extent() method of a GeoAxes:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
my_proj = ccrs.Miller(central_longitude=180)
ax = plt.axes(projection=my_proj)
xmin, xmax, ymin, ymax = ax.get_extent()
You also need to transform coordinate points from geographic to projection coordinates, which is the function of the transform_points() method of a coordinate reference system instance:
import numpy as np
lons2d, lats2d = np.meshgrid(lons, lats) # lons lats are in degrees
transformed = my_proj.transform_points(ccrs.Geodetic(), lons2d, lats2d)
x = transformed[..., 0] # lons in projection coordinates
y = transformed[..., 1] # lats in projection coordinates
Now you can use the same technique as in the basemap example to filter and plot points, where instead of m.xmin you use xmin etc.
There are of course alternate ways of doing this which have pros and cons relative to the basemap example. If you come up with something nice you can contribute it to the Cartopy gallery.

Estimation of t-distribution by mean of samples does not work

I am trying to create a t-distribution by taking the mean of many samples from a normal distribution (and then estimating the shape with kernel density estimation).
For some reason, I am getting pretty different results when I compare what I get with a proper t-distribution. I don't understand what is going wrong, so I think I am confused about something.
Here is the code:
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import seaborn
inner_sample_size = 10
X = np.arange(-3, 3, 0.01)
results = [
np.mean(np.random.normal(size=inner_sample_size))
for _ in range(10000)
]
estimation = gaussian_kde(results)
plt.plot(X, estimation.evaluate(X))
t_samples = np.random.standard_t(inner_sample_size, 10000)
t_estimator = gaussian_kde(t_samples)
plt.plot(X, t_estimator.evaluate(X))
plt.ylabel("Probability density")
plt.show()
And here is the plot I get:
Where the orange line is numpy's own t-distribution, and the blue line is the one estimated by sampling.
Your assumption that the mean of Standard Normals has T distribution is incorrect. In fact, the mean of Standard Normals has Normal Distribution, which explains the shape of your blue graph. To generate one random variable T from a T distribution with k degrees of freedom, you first generate k+1 independent Standard Normals Z_i, i=0,...,k. You then compute
T = Z_0 / sqrt( sum(Z_i^2, i=1 to k)/k ).
The sum of squared Standard Normals sum(Z_i^2, i=1 to k) has Chi-Squared Distribution with k degrees of freedom, so if there is a pre-canned method to generate this, you should use it, since it's likely more efficient.

Cubic spline interpolation drops out halfway

I am trying to make a cubic spline interpolation and for some reason, the interpolation drops off in the middle of it. It's very mysterious and I can't find any mention of similar occurrences anywhere online.
This is for my dissertation so I have excluded some labels etc. to keep it obscure intentionally, but all the relevant code is as follows. For context, this is an astronomy related plot.
from scipy.interpolate import CubicSpline
import numpy as np
import matplotlib.pyplot as plt
W = np.array([0.435,0.606,0.814,1.05,1.25,1.40,1.60])
sum_all = np.array([sum435,sum606,sum814,sum105,sum125,sum140,sum160])
sum_can = np.array([sumc435,sumc606,sumc814,sumc105,sumc125,sumc140,sumc160])
fall = CubicSpline(W,sum_all)
newallx=np.arange(0.435,1.6,0.001)
newally=fall(newallx)
fcan = CubicSpline(W,sum_can)
newcanx=np.arange(0.435,1.6,0.001)
newcany=fcan(newcanx)
#----plot
plt.plot(newallx,newally)
plt.plot(newcanx,newcany)
plt.plot(W,sum_all,marker='o',color='r',linestyle='')
plt.plot(W,sum_can,marker='o',color='b',linestyle='')
plt.yscale("log")
plt.ylabel("Flux S$_v$ [erg s$^-$$^1$ cm$^-$$^2$ Hz$^-$$^1$]")
plt.xlabel("Wavelength [n$\lambda$]")
plt.show()
The plot that I get from that comes out like this, with a clear gap in the interpolation:
And in case you are wondering, these are the values in the sum_all and sum_can arrays (I assume it doesn't matter, but just in case you want the numbers to plot it yourself):
sum_all:
[ 3.87282732e+32 8.79993191e+32 1.74866333e+33 1.59946687e+33
9.08556547e+33 6.70458731e+33 9.84832359e+33]
can_all:
[ 2.98381061e+28 1.26194810e+28 3.30328780e+28 2.90254609e+29
3.65117723e+29 3.46256846e+29 3.64483736e+29]
The gap happens between [0.606,1.26194810e+28] and [0.814,3.30328780e+28]. If I change the intervals from 0.001 to something higher, it's obvious that the plot doesn't actually break off but merely dips below 0 on the y-axis (but the plot is continuous). So why does it do that? Surely that's not a correct interpolation? Just looking with our eyes, that's clearly not a well-interpolated connection between those two points.
Any tips or comments would be extremely appreciated. Thank you so much in advance!
The reason for the breakdown can be better observed on a linear scale.
We see that the spline actually passes below 0, which is undefined on a log scale.
So I would suggest to first take the logarithm of the data, perform the spline interpolation on the logarithmically scaled data, and then scale back by the 10th power.
from scipy.interpolate import CubicSpline
import numpy as np
import matplotlib.pyplot as plt
W = np.array([0.435,0.606,0.814,1.05,1.25,1.40,1.60])
sum_all = np.array([ 3.87282732e+32, 8.79993191e+32, 1.74866333e+33, 1.59946687e+33,
9.08556547e+33, 6.70458731e+33, 9.84832359e+33])
sum_can = np.array([ 2.98381061e+28, 1.26194810e+28, 3.30328780e+28, 2.90254609e+29,
3.65117723e+29, 3.46256846e+29, 3.64483736e+29])
fall = CubicSpline(W,np.log10(sum_all))
newallx=np.arange(0.435,1.6,0.001)
newally=fall(newallx)
fcan = CubicSpline(W,np.log10(sum_can))
newcanx=np.arange(0.435,1.6,0.01)
newcany=fcan(newcanx)
plt.plot(newallx,10**newally)
plt.plot(newcanx,10**newcany)
plt.plot(W,sum_all,marker='o',color='r',linestyle='')
plt.plot(W,sum_can,marker='o',color='b',linestyle='')
plt.yscale("log")
plt.ylabel("Flux S$_v$ [erg s$^-$$^1$ cm$^-$$^2$ Hz$^-$$^1$]")
plt.xlabel("Wavelength [n$\lambda$]")
plt.show()

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...