changing range causes a distribution not normal - numpy

A post gives some code to plot this figure
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10, 11)
xU, xL = x + 0.5, x - 0.5
prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3)
prob = prob / prob.sum() #normalize the probabilities so their sum is 1
nums = np.random.choice(x, size = 10000, p = prob)
plt.hist(nums, bins = len(x))
I modifyied this line
x = np.arange(-10, 11)
to this line
x = np.arange(10, 31)
I got this figure
How to fix that?

Given what you're asking Python to do, there's no error in this plot: it's a histogram of 10,000 samples from the tail (anything that rounds to between 10 and 31) of a normal distribution with mean 0 and standard deviation 3. Since probabilities drop off steeply in the tail of a normal, it happens that none of the 10,000 exceeded 17, which is why you didn't get the full range up to 31.
If you just want the x-axis of the plot to cover your full intended range, you could add plt.xlim(9.5, 31.5) after plt.hist.
If you want a histogram with support over this entire range, then you'll need to adjust the mean and/or variance of the distribution. For instance, if you specify that your normal distribution has mean 20 rather than mean 0 when you obtain prob, i.e.
prob = ss.norm.cdf(xU, loc=20, scale=3) - ss.norm.cdf(xL, loc=20, scale=3)
then you'll recover a similar-looking histogram, just translated to the right by 20.


polynomial fitting of a signal and plotting the fitted signal

I am trying to use a polynomial expression that would fit my function (signal). I am using function to fit my function(signal) using the coefficients. Now, after generating the coefficients, I want to put those coefficients back into the polynomial equation - get the corresponding y-values - and plot them on the graph. But I am not getting what I want (orange line) . What am I doing wrong here?
import math
def getYValueFromCoeff(f,coeff_list): # low to high order
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**i)
return y_plot_values
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
coeffs= test1.coef
coef_y= getYValueFromCoeff(x, test1.coef)
plt.plot(x, coef_y)
If you check out the documentation, consider the two properties: poly.domain and poly.window. To avoid numerical issues, the range poly.domain = [x.min(), x.max()] of independent variable (x) that we pass to the fit() is being normalized to poly.window = [-1, 1]. This means the coefficients you get from poly.coef apply to this normalized range. But you can adjust this behaviour (sacrificing numerical stability) accordingly, that is, adjustig the poly.window will make your curves match:
test1 =, y, deg=no_of_coef, window=[x.min(), x.max()])
But unless you have a good reason to do that, I'd stick to the default behaviour of fit().
As a side note: Evaluating polynomials or lists of coefficients is already implemented in numpy, e.g. using directly
coef_y = test1(x)
or alternatively using np.polyval.
I always like to see original solutions to problems. I urge you to continue to pursue that as that is the best way to learn how to fit functions programmatically. I also wanted to provide the solution that is much more tailored towards a standard numpy implementation. As for your custom function, you did really well. The only issue is that the coefficients are from high to low order, while you were counting up in powers from 0 to highest power. Simply counting down from highest power to 0, allows your function to give the correct result. Notice how your function overlays perfectly with the numpy polyval.
import numpy as np
import matplotlib.pyplot as plt
def getYValueFromCoeff(f,coeff_list): # low to high order
for j in range(len(f)):
item_list= []
for i in range(len(coeff_list)):
item= (coeff_list[i])*((f[j])**(len(coeff_list)-i-1))
return y_plot_values
no_of_coef = 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
coeffs = np.polyfit(x,y,no_of_coef)
coef_y = np.polyval(coeffs,x)
COEF_Y = getYValueFromCoeff(x,coeffs)
plt.plot(x, coef_y)
plt.plot(x, COEF_Y)
plt.legend(['Original Function', 'Fitted Function', 'Custom Fitting'])
Here's the simple way of doing it if you didn't know that already...
import math
from numpy.polynomial import Polynomial as poly
import numpy as np
import matplotlib.pyplot as plt
no_of_coef= 10
#original signal
x = np.linspace(0, 0.01, 10)
period = 0.01
y = np.sin(np.pi * x / period)
#poly fit
plt.plot(x, y, 'r', label='original y')
x = np.linspace(0, 0.01, 1000)
plt.plot(x, test1(x), 'b', label='y_fit')

Solve motion equations for first ODE using scipy

I would like to solve motion first order ODE equations using scipy solve_ivp function. I can see that I'm doing something wrong because this should be an ellipse but I'm plotting only four points. Are you able to spot the mistake?
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy.integrate
gim = 4*(math.pi**2)
x0 = 1 #x-position of the center or h
y0 = 0 #y-position of the center or k
vx0 = 0 #vx position
vy0 = 1.1* 2* math.pi #vy position
initial = [x0, y0, vx0, vy0] #initial state of the system
time = np.arange(0, 1000, 0.01) #period
def motion(t, Z):
dx = Z[2] # vx
dy = Z[3] # vy
dvx = -gim/(x**2+y**2)**(3/2) * x * Z[2]
dvy = -gim/(x**2+y**2)**(3/2) * y * Z[3]
return [dx, dy, dvx, dvy]
sol = scipy.integrate.solve_ivp(motion, t_span=time, y0= initial, method='RK45')
plt.plot(sol.y[0],sol.y[1],"x", label="Scipy RK45 solution")
The code should not be able to run from a fresh workspace. The variables x,y that are used in the gravitation formula are not declared anywhere. So insert the line x,y = Z[:2] or similar.
The gravitation formula usually does not contain the velocity components. Remove Z[2], Z[3].
Check again what the time span and evaluation times arguments expect. The time span takes the first two values from the array. So change to t_span=time[[0,-1]] to build the pair of first and last time value.
The second plot suffers from insufficient evaluation points, the line segments used are too large. With your time array that should not be a problem.

statsmodels: IntegrationWarning: The maximum number of subdivisions (50) has been achieved

Trying to plot a CDF with seaborns, then encountered this error:
../venv/lib/python3.7/site-packages/statsmodels/nonparametric/ IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
args=endog)[0] for i in range(1, gridsize)]
Some minutes after pressing the return key
../venv/lib/python3.7/site-packages/statsmodels/nonparametric/ IntegrationWarning: The integral is probably divergent, or slowly convergent.
args=endog)[0] for i in range(1, gridsize)]
plt.title('my distribution')
If it could be of help:
Sample data:
[ 0.00362846 0.00123409 0.00013711 -0.00029235 0.01515175 0.02780404
0.03610236 0.03410224 0.03887933 0.0307084 ]
Have no idea what the subdivisions are, is there a way to increase it?
A kde plot is created by summing one gaussian bell shape for every data point. Summing 4 million curves will create memory and performance problems, which might cause come functions to fail. The exact error message can be very cryptic.
The easiest way to work around the problem, is to subsample the data, as for a more or less smooth distribution the kde (and the cumultative kde or cdf) will look very similar whether the data is subsampled or not. Subsampling every 100th entry is easy using slicing data[::100].
Alternatively, with that many data, the "real" cdf can be drawn by plotting the sorted data versus N evenly spaced numbers from 0 to 1. (Where N is the number of data points.)
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
N = 1000000
data = np.random.normal(np.repeat(np.random.uniform(10, 20, 10), N // 10), 1)
sns.kdeplot(data[::100], cumulative=True, color='g', label='cumulative kde')
q = np.linspace(0, 1, data.size)
plt.plot(data, q, ':r', lw=2, label='cdf from sorted data')
Note that in a similar, though slightly more involved, way you can draw a "more honest" kde given the differences of a large enough array of sorted data. np.interp interpolates the quantiles to a regularly spaced x-axis. As the raw differences are rather jaggy, some smoothing is needed.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import statsmodels.api as sm
N = 1000000
data = np.random.normal(np.repeat(np.random.uniform(10, 20, 10), N // 10), 1)
sns.kdeplot(data[::100], cumulative=False, color='g', label='kde')
p = np.linspace(0, 1, data.size)
x = np.linspace(data.min(), data.max(), 1000)
y = np.interp(x, data, p)
# use lowess filter to smoothen the curve
lowess = sm.nonparametric.lowess(np.diff(y) * 1000 / (data.max() - data.min()), (x[:-1] + x[1:]) / 2, frac=0.05)
plt.plot(lowess[:, 0], lowess[:, 1], '-r', label='smoothed diff of sorted data')
# plt.plot((x[:-1]+x[1:])/2,
# np.convolve(np.diff(y), np.ones(20)/20, mode='same')*1000/(data.max() - data.min()),
# label='test np.diff')

Mutiple plots in a single window

I need to draw many such rows (for a0 .. a128) in a single window. I've searched in FacetGrid, PairGrid and all over around but couldn't find. Only regplot has similar argument ax but it doesn't plot histograms. My data is 128 real valued features with label column [0, 1]. I need the graphs to be shown from my Python code as a separate application on Linux.
Also, it there a way to scale this histogram to show relative values on Y such that the right curve is not skewed?
g = sns.FacetGrid(df, col="Result"), "a0", bins=20)
Just a simple example using matplotlib. The code is not optimized (ugly, but simple plot-indexing):
import numpy as np
import matplotlib.pyplot as plt
N = 5
data = np.random.normal(size=(N*N, 1000))
f, axarr = plt.subplots(N, N) # maybe you want sharex=True, sharey=True
pi = [0,0]
for i in range(data.shape[0]):
if pi[1] == N:
pi[0] += 1 # next row
pi[1] = 0 # first column again
axarr[pi[0], pi[1]].hist(data[i], normed=True) # i was wrong with density;
# normed=True should be used
pi[1] += 1

inverse of FFT not the same as original function

I don't understand why the ifft(fft(myFunction)) is not the same as my function. It seems to be the same shape but a factor of 2 out (ignoring the constant y-offset). All the documentation I can see says there is some normalisation that fft doesn't do, but that ifft should take care of that. Here's some example code below - you can see where I've bodged the factor of 2 to give me the right answer. Thanks for any help - its driving me nuts.
import numpy as np
import scipy.fftpack as fftp
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
def fourier_series(x, y, wn, n=None):
# get FFT
myfft = fftp.fft(y, n)
# kill higher freqs above wavenumber wn
myfft[wn:] = 0
# make new series
y2 = fftp.ifft(myfft).real
# find constant y offset
c = fftp.ifft(myfft)[0]
# remove c, apply factor of 2 and re apply c
y2 = (y2-c)*2 + c
plt.plot(x, y, x, y2)
if __name__=='__main__':
x = np.array([float(i) for i in range(0,360)])
y = np.sin(2*np.pi/360*x) + np.sin(2*2*np.pi/360*x) + 5
fourier_series(x, y, 3, 360)
You're removing half the spectrum when you do myfft[wn:] = 0. The negative frequencies are those in the top half of the array and are required.
You have a second fudge to get your results which is taking the real part to find y2: y2 = fftp.ifft(myfft).real (fftp.ifft(myfft) has a non-negligible imaginary part due to the asymmetry in the spectrum).
Fix it with myfft[wn:-wn] = 0 instead of myfft[wn:] = 0, and remove the fudges. So the fixed code looks something like:
import numpy as np
import scipy.fftpack as fftp
import matplotlib.pyplot as plt
def fourier_series(x, y, wn, n=None):
# get FFT
myfft = fftp.fft(y, n)
# kill higher freqs above wavenumber wn
myfft[wn:-wn] = 0
# make new series
y2 = fftp.ifft(myfft)
plt.plot(x, y, x, y2)
if __name__=='__main__':
x = np.array([float(i) for i in range(0,360)])
y = np.sin(2*np.pi/360*x) + np.sin(2*2*np.pi/360*x) + 5
fourier_series(x, y, 3, 360)
It's really worth paying attention to the interim arrays that you are creating when trying to do signal processing. Invariably, there are clues as to what is going wrong that should direct you to the problem. In this case, you taking the real part masked the problem and made your task more difficult.
Just to add another quick point: Sometimes taking the real part of the resultant array is exactly the correct thing to do. It's often the case that you end up with an imaginary part to the signal output which is just down to numerical errors in the input to the inverse FFT. Typically this manifests itself as very small imaginary values, so taking the real part is basically the same array.
You are killing the negative frequencies between 0 and -wn.
I think what you mean to do is to set myfft to 0 for all frequencies outside [-wn, wn].
Change the following line:
myfft[wn:] = 0
myfft[wn:-wn] = 0