Show the pdf of a chi-squared distribution using python - chi-squared

I'm trying to reconstruct the pdf of the chi-squared distribution with 3 degrees of freedom from a simulated sample. Here is my python code:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
norm = stats.norm(0, 1)
x1 = [x * x for x in np.random.randn(1000)]
x2 = [x * x for x in np.random.randn(1000)]
x3 = [x * x for x in np.random.randn(1000)]
f = x1 + x2 + x3
plt.hist(f, 100)
plt.show()
The result I got was this.
Of course this is wrong. As shown in Wikipedia, the pdf of the chi-squared distribution with 3 degrees of freedom should go upwards first from zero and go downwards later, not something keep climbing like mine. Is there anything wrong with my code? The formula I used was as follows:
Q = x1^2 + x2^2 + x3^2
where x1, x2 and x3 are independent, standard normal random variables.

Although I tried your code and got the same result as you, if you use your 'norm' variable to generate the random values it seems to work.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
norm = stats.norm(0, 1)
x1 = norm.rvs(size=100000)**2
x2 = norm.rvs(size=100000)**2
x3 = norm.rvs(size=100000)**2
f = x1 + x2 + x3
plt.hist(f, 60, normed=True)
# Plot the theoretical density of f
x = np.arange(0, 30, .05)
plt.plot(x, stats.chi2.pdf(x, df=3), color='r', lw=2)
plt.show()
The result I got was

The '+' operator works differently on Python lists than on Numpy arrays.
f = x1 + x2 + x3
concatenates three lists into one. However, you want to add the content of the three lists element-wise, which could be done like this:
f = np.array(x1) + np.array(x2) + np.array(x3)

Related

How to plot a 3D function with colors given spacing 2D input

Let's assume I have 3 arrays defined as:
v1=np.linspace(1,100)
v2=np.linspace(1,100)
v3=np.linspace(1,100)   
Then I have a function that takes those 3 values and gives me the desired output, let's assume it is like:
f = (v1 + v2*10)/v3
I want to plot that function on a 3D plot with axis v1,v2,v3 and color it's surface depending on its value.
More than the best way to plot it, I was also interested in how to scroll all the values in the in vectors and build the function point by point.
I have been trying with for loops inside other for loops but I am always getting one error.
MANY THANKS
I tried this but i'm always getting a line instead of a surface
import mpl_toolkits.mplot3d.axes3d as axes3d
import sympy
from sympy import symbols, Function
# Parameters I use in the function
L = 132
alpha = 45*math.pi/180
beta = 0
s,t = symbols('s,t')
z = Function('z')(s,t)
figure = plt.figure(figsize=(8,8))
ax = figure.add_subplot(1, 1, 1, projection='3d')
# experiment with various range of data in x and y
x1 = np.linspace(-40,-40,100)
y1 = np.linspace(-40,40,100)
x,y = np.meshgrid(x1,y1)
# My function Z
c1=math.cos(beta)**2
c2=math.cos(alpha)**2
s1=math.sin(alpha)**2
den = math.sqrt((c1*c2)+s1)
z=L*((math.cos(beta)/den)-1)+(s*(math.sin(alpha)))+(t*(1-math.cos(alpha)))
ax.plot_surface(x,y,z,cmap='rainbow')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
In this example I'm going to show you how to achieve your goal. Specifically, I use Numpy because it supports vectorized operations, hence I avoid for loops.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import matplotlib.cm as cm
# Parameters I use in the function
L = 132
alpha = 45*np.pi/180
beta = 0
figure = plt.figure()
ax = figure.add_subplot(1, 1, 1, projection='3d')
# experiment with various range of data in x and y
x1 = np.linspace(-40,40,100)
y1 = np.linspace(-40,40,100)
x,y = np.meshgrid(x1,y1)
# My function Z
c1=np.cos(beta)**2
c2=np.cos(alpha)**2
s1=np.sin(alpha)**2
den = np.sqrt((c1*c2)+s1)
z=L*((np.cos(beta)/den)-1)+(x*(np.sin(alpha)))+(y*(1-np.cos(alpha)))
# compute the color values according to some other function
color_values = np.sqrt(x**2 + y**2 + z**2)
# normalize color values between 0 and 1
norm = Normalize(vmin=color_values.min(), vmax=color_values.max())
norm_color_values = norm(color_values)
# chose a colormap and create colors starting from the normalized values
cmap = cm.rainbow
colors = cmap(norm_color_values)
surf = ax.plot_surface(x,y,z,facecolors=colors)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
# add a colorbar
figure.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), label="radius")
plt.show()

Matplotlib 3d barplot failing to draw just one face

import numpy as np
import matplotlib.pyplot as plt
x, y = np.array([[x, y] for x in range(5) for y in range(x+1)]).T
z = 1/ (5*x + 5)
fig = plt.figure()
ax = fig.gca(projection = '3d')
ax.bar3d(x, y, np.zeros_like(z), dx = 1, dy = 1, dz = z)
yields
How do I get the face at (1,0) to display properly?
There is currently no good solution to this. Fortunately though, it happens only for some viewing angles. So you can choose an angle where it plots fine, e.g.
ax.view_init(azim=-60, elev=25)

How to draw polar hist2d/hexbin in matplotlib?

I have a random vector (random length and random angle) and would like to plot its approximate PDF (probability density function) via hist2d or hexbin. Unfortunately they seems not to work with polar plots, the following code yields nothing:
import numpy as np
import matplotlib.pyplot as plt
# Generate random data:
N = 1024
r = .5 + np.random.normal(size=N, scale=.1)
theta = np.pi / 2 + np.random.normal(size=N, scale=.1)
# Plot:
ax = plt.subplot(111, polar=True)
ax.hist2d(theta, r)
plt.savefig('foo.png')
plt.close()
I would like it to look like this: pylab_examples example code: hist2d_demo.py only in polar coordinates. The closest result so far is with colored scatter plot as adviced here:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate random data:
N = 1024
r = .5 + np.random.normal(size=N, scale=.1)
theta = np.pi / 2 + np.random.normal(size=N, scale=.1)
# Plot:
ax = plt.subplot(111, polar=True)
# Using approach from:
# https://stackoverflow.com/questions/20105364/how-can-i-make-a-scatter-plot-colored-by-density-in-matplotlib
theta_r = np.vstack([theta,r])
z = gaussian_kde(theta_r)(theta_r)
ax.scatter(theta, r, c=z, s=10, edgecolor='')
plt.savefig('foo.png')
plt.close()
Image from the second version of the code
Is there a better way to make it more like real PDF generated with hist2d? This question seems to be relevant (the resulting image is as expected), but it looks messy.
One way to this using pcolormesh:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate random data:
N = 10000
r = .5 + np.random.normal(size=N, scale=.1)
theta = np.pi / 2 + np.random.normal(size=N, scale=.1)
# Histogramming
nr = 50
ntheta = 200
r_edges = np.linspace(0, 1, nr + 1)
theta_edges = np.linspace(0, 2*np.pi, ntheta + 1)
H, _, _ = np.histogram2d(r, theta, [r_edges, theta_edges])
# Plot
ax = plt.subplot(111, polar=True)
Theta, R = np.meshgrid(theta_edges, r_edges)
ax.pcolormesh(Theta, R, H)
plt.show()
Result:
Note that the histogram is not yet normalized by the area of the bin, which is not constant in polar coordinates. Close to the origin, the bins are pretty small, so some other kind of meshing might be better.

Get the y value of a given x

I have a simple question but have not found any answer..
Let's have a look at this code :
from matplotlib import pyplot
import numpy
x=[0,1,2,3,4]
y=[5,3,40,20,1]
pyplot.plot(x,y)
It is plotted and all the points ared linked.
Let's say I want to get the y value of x=1,3.
How can I get the x values matching with y=30 ? (there are two)
Many thanks for your help
You could use shapely to find the intersections:
import matplotlib.pyplot as plt
import numpy as np
import shapely.geometry as SG
x=[0,1,2,3,4]
y=[5,3,40,20,1]
line = SG.LineString(list(zip(x,y)))
y0 = 30
yline = SG.LineString([(min(x), y0), (max(x), y0)])
coords = np.array(line.intersection(yline))
print(coords[:, 0])
fig, ax = plt.subplots()
ax.axhline(y=y0, color='k', linestyle='--')
ax.plot(x, y, 'b-')
ax.scatter(coords[:, 0], coords[:, 1], s=50, c='red')
plt.show()
finds solutions for x at:
[ 1.72972973 2.5 ]
The following code might do what you want. The interpolation of y(x) is straight forward, as the x-values are monotonically increasing. The problem of finding the x-values for a given y is not so easy anymore, once the function is not monotonically increasing as in this case. So you still need to know roughly where to expect the values to be.
import numpy as np
import scipy.interpolate
import scipy.optimize
x=np.array([0,1,2,3,4])
y=np.array([5,3,40,20,1])
#if the independent variable is monotonically increasing
print np.interp(1.3, x, y)
# if not, as in the case of finding x(y) here,
# we need to find the zeros of an interpolating function
y0 = 30.
initial_guess = 1.5 #for the first zero,
#initial_guess = 3.0 # for the secon zero
f = scipy.interpolate.interp1d(x,y,kind="linear")
fmin = lambda x: np.abs(f(x)-y0)
s = scipy.optimize.fmin(fmin, initial_guess, disp=False)
print s
I use python 3.
print(numpy.interp(1.3, x, y))
Y = 30
eps = 1e-6
j = 0
for i, ((x0, x1), (y0, y1)) in enumerate(zip(zip(x[:-1], x[1:]), zip(y[:-1], y[1:]))):
dy = y1 - y0
if abs(dy) < eps:
if y0 == Y:
print('There are infinite number of solutions')
else:
t = (Y - y0)/dy
if 0 < t < 1:
sol = x0 + (x1 - x0)*t
print('solution #{}: {}'.format(j, sol))
j += 1

Plotting date data with pcolor

I have data like this:
dates = ['1874-05-02', '1874-05-03', '1874-05-04',
'1874-05-05', '1874-05-06','1874-05-07']
data1 = ['-7.000', '7.000', '2.000', '11.600', '13.500', '-13.500']
data2 = ['0.000', '25.000', '0.000', '75.000', '12.000', '22.000']
and I need to draw a diagram where dates are on x-axis and data1 on y-axis. Data2 is needed to draw dots in diagram and they should all be in differend colours corresponding their values. So how can I do this with pcolor or pcolormesh?
Here is an example-code I found from http://matplotlib.org/examples/pylab_examples/pcolor_demo.html and I was wondering could I get anything like this out with my data? Here is another link to demonstrate what I'm supposed to do: https://dl.dropboxusercontent.com/u/47527320/diagram.jpg. Can I get a diagram like this with pcolor?
import matplotlib.pyplot as plt
import numpy as np
dx, dy = 0.15, 0.05
y, x = np.mgrid[slice(-3, 3 + dy, dy),slice(-3, 3 + dx, dx)]
z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)
z = z[:-1, :-1]
z_min, z_max = -np.abs(z).max(), np.abs(z).max()
plt.subplot(2, 2, 1)
plt.pcolor(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max)
plt.title('pcolor')
plt.axis([x.min(), x.max(), y.min(), y.max()])
plt.colorbar()
plt.show()
A scatter plot will give what you describe.
import numpy as np
import pylab as plt
import datetime
dt = datetime.datetime
dates = [dt(1874,05,02), dt(1874,05,03), dt(1874,05,04), dt(1874,05,05), dt(1874,05,06),dt(1874,05,07)]
data1 = [-7.000, 7.000, 2.000, 11.600, 13.500, -13.500]
data2 = [0.000, 25.000, 0.000, 75.000, 12.000, 22.000]
plt.scatter(dates, data1, c=data2, s=400)
plt.show()
There was some discussion in the comments about needing 2D data, but I think that was due to lack of clarity of what you were looking for. The types of plots in your mpl example link and your sketch are completely different in nature. Take a look through the mpl gallery page and you'll see that the ones like your sketch (and that also match the structure of your data well) are using a scatter plot.
There are lots of options here for how to handle the dates and colors, but this should get you started.