Annotate a data point with a graph - matplotlib

For the lack of better term, is there a way to annotate a data point with a graph? I include an example of what I am for below
Big black data point with a graph corresponding to it. Note that graph is rotated so its "x" axis (not shown) is perpendicular to the "y" axis of the scatter plot
annotation_box http://matplotlib.org/examples/pylab_examples/demo_annotation_box.html is the closest thing I can find at the moment, but even knowing the proper term for what I want to do, would make my life easier.

If I understood the problem correctly, what you need are floating axes that you can place as annotations over your plot. Unfortunately, this is not easily possible in matplotlib, as far I know.
An easy solution would be to just plot the points and graphs in the same axis, with the graphs scaled down and shifted close to the points.
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
xp = [5, 1, 3]
yp = [2, 1, 4]
# just generate some curves
curves_x = np.array([np.linspace(0, 10, 100)] * 3)
curves_y = sps.gamma.pdf(curves_x[0], [[2], [5], [7]], 1)
plt.scatter(xp, yp, s=50)
for x, y, cx, cy in zip(xp, yp, curves_x, curves_y):
plt.plot(x + cy / np.max(cy) + 0.1 , y + cx / np.max(cx) - 0.5)
plt.show()
This is a very simplistic example. The numbers will have to be tuned to look nice with varying scale of the data.

Related

Need help displaying 4D data in matplotlib 3D scatterplot properly

Hey so I'm an undergraduate working in an imaging lab and I have a 3D numpy array that has values from 0-9 to indicate concentration in a 3D space. I'm trying to plot these values in a scatterplot with a colormap to indicate the value between 0-9. The array is 256 x 256 x 48, so I feel like the size of it is making it difficult for me to plot the array in a meaningful way.
I've attached a picture of what it looks like right now. As you can see the concentration looks very "faded" even for very high values and I'm not entirely sure why. Here is the code I'm using to generate the plot:
current heatmap
fig = plt.figure()
x, y, z = np.meshgrid(range(256), range(256), range(48))
col = sum_array.flatten()
ax = fig.add_subplot(111, projection = '3d')
sc = ax.scatter(x, y, z, c = col, cmap='Reds',
linewidths=.01, s=.03, vmin=0, vmax=9,
marker='.', alpha=1)
plt.colorbar(sc)
plt.show()
If anyone can help me display the colors in a more bright/concentrated manner so the heat map is visually useful, I'd really appreciate it. Thank you!

Create 2D hanning, hamming, blackman, gaussian window in NumPy

I am interested in creating 2D hanning, hamming, Blackman, etc windows in NumPy. I know that off-the-shelf functions exist in NumPy for 1D versions of it such as np.blackman(51), np.hamming(51), np.kaiser(51), np.hanning(51), etc.
How to create 2D versions of them? I am not sure if the following solution is the correct way.
window1d = np.blackman(51)
window2d = np.sqrt(np.outer(window1d,window1d))
---EDIT
The concern is that np.sqrt expects only positive values while np.outer(window1d,window1d) will definitely have some negative values. One solution is to relinquish np.sqrt
Any suggestions how to extend these 1d functions to 2d?
That looks reasonable to me. If you want to verify what you are doing is sensible, you can try plotting out what you are creating.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1.5, 51)
y = np.linspace(0, 1.5, 51)
window1d = np.abs(np.blackman(51))
window2d = np.sqrt(np.outer(window1d,window1d))
X, Y = np.meshgrid(x, y)
Z = window2d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
plt.show()
This gives -
This looks like the 2d generalization of the 1d plot which looks like -
However, I had to do window1d = np.abs(np.blackman(51)) when creating the 1d version initially because otherwise, you would end up with small negative values in the final 2D array which you cannot take sqrt of.
Disclaimer: I am not familiar with the functions or their usual use-case. But the shapes of these plots seems to make sense. If the use-case of these functions is somewhere in which the actual values matter, this could be off.

Ticks position in heatmap with categorical data (seaborn)

I am trying to plot a confusion matrix of my predictions. My data is multi-class (13 different labels) so I'm using a heatmap.
As you can see below, my heat map looks generally okay but the labels are a bit out of position: y ticks should be a little lower and x ticks should be a bit more to the right. I want to move both axis ticks a bit so they will aligned with the center of each square.
my code:
sns.set()
my_mask = np.zeros((con_matrix.shape[0], con_matrix.shape[0]), dtype=int)
for i in range(con_matrix.shape[0]):
for j in range(con_matrix.shape[0]):
my_mask[i][j] = con_matrix[i][j] == 0
fig_dims = (10, 10)
plt.subplots(figsize=fig_dims)
ax = sns.heatmap(con_matrix, annot=True, fmt="d", linewidths=.5, cmap="Pastel1", cbar=False, mask=my_mask, vmax=15)
plt.xticks(range(len(party_names)), party_names, rotation=45)
plt.yticks(range(len(party_names)), party_names, rotation='horizontal')
plt.show()
and for reproduction purpose, here are con_matrix and party_names hard-coded:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
con_matrix = np.array([[55, 0, 0, 0,0, 0, 0,0,0,0,0,0,2], [0,199,0,0,0,0,0,0,0,0,2,0,1],
[0, 0,52,0,0,0,0,0,0,0,0,0,1],
[0,0,0,39,0,0,0,0,0,0,0,0,0],
[0,0,0,0,90,0,0,0,0,0,0,4,3],
[0,0,0,1,0,35,0,0,0,0,0,0,0],
[0,0,0,0,5,0,26,0,0,1,0,1,0],
[0,5,0,0,0,1,0,44,0,0,3,0,1],
[0,1,0,0,0,0,0,0,52,0,0,0,0],
[0,1,0,0,2,0,0,0,0,235,0,1,1],
[1,2,0,0,0,0,0,3,0,0,34,0,3],
[0,0,0,0,5,0,0,0,0,1,0,40,0],
[0,0,0,0,0,0,0,0,0,1,0,0,46]])
party_names = ['Blues', 'Browns', 'Greens', 'Greys', 'Khakis', 'Oranges', 'Pinks', 'Purples', 'Reds', 'Turquoises', 'Violets', 'Whites', 'Yellows']
I already tried to work with position argument of different axes, but it did not turn out well. Could not find an exactly answer in this site as well (at least not a solution that works for categorical data).
I'm new in visualization with seaborn, any improvement with explanations would be appreciated (not only for my problem but on my code & visualization as well).
You can shift both the ticklabels by 0.5 offset to have the desired alignment. To do so, I have used NumPy's arange that enables vectorized addition of 0.5 to the whole array.
plt.xticks(np.arange(len(party_names))+0.5, party_names, rotation=45)
plt.yticks(np.arange(len(party_names))+0.5, party_names, rotation='horizontal')

Matplotlib graphic's line smoothing

bullets' trajectory comparison
I'm a new python user. I'm using this powerful code to do scientific research and data analysis.
I'm writing my thesis in physics, I'm trying to describe and analyze the external ballistics behind the bullet flight.
I'm using matplotlib to draw graphics representing the bullet's parabolic path and the related cross points; given that I'd like to know if there is a special code to smooth up the graphic lines drawn following the real experimental data avoiding to have a graphic made by a lot of linear segments.
Thanks a lot to all of you!
Francesco
All right Francesco, thanks for uploading the image. Now, let's have some fun with coding.
As first I suggest to use the numpy function to fit a polynomial curve of a certain degree to a set of value: np.polyfit(). Be aware of the degree you set as the results can widely change. For more information, please take a look at this documentation: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.polyfit.html
Then, in order to smooth your curve down, you need to increase the number of point to draw the function with np.linspace() and use this new set to apply the
function np.poly1d() (it calculates the y coordinates based on the fitting you did with polyfit).
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = [0, 50, 100, 150, 200, 250]
y = [-1, 0.8, 1.9, 1.6, 0, -3]
z = np.polyfit(x, y, 2)
p = np.poly1d(z)
xp = np.linspace(-2, 255)
plt.plot(x, y, '.', xp, p(xp), '-')
plt.show()

Estimation of t-distribution by mean of samples does not work

I am trying to create a t-distribution by taking the mean of many samples from a normal distribution (and then estimating the shape with kernel density estimation).
For some reason, I am getting pretty different results when I compare what I get with a proper t-distribution. I don't understand what is going wrong, so I think I am confused about something.
Here is the code:
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import seaborn
inner_sample_size = 10
X = np.arange(-3, 3, 0.01)
results = [
np.mean(np.random.normal(size=inner_sample_size))
for _ in range(10000)
]
estimation = gaussian_kde(results)
plt.plot(X, estimation.evaluate(X))
t_samples = np.random.standard_t(inner_sample_size, 10000)
t_estimator = gaussian_kde(t_samples)
plt.plot(X, t_estimator.evaluate(X))
plt.ylabel("Probability density")
plt.show()
And here is the plot I get:
Where the orange line is numpy's own t-distribution, and the blue line is the one estimated by sampling.
Your assumption that the mean of Standard Normals has T distribution is incorrect. In fact, the mean of Standard Normals has Normal Distribution, which explains the shape of your blue graph. To generate one random variable T from a T distribution with k degrees of freedom, you first generate k+1 independent Standard Normals Z_i, i=0,...,k. You then compute
T = Z_0 / sqrt( sum(Z_i^2, i=1 to k)/k ).
The sum of squared Standard Normals sum(Z_i^2, i=1 to k) has Chi-Squared Distribution with k degrees of freedom, so if there is a pre-canned method to generate this, you should use it, since it's likely more efficient.