Visualizing Data, Tracking Specific SD Values - data-visualization

BLUF: I want to track a specific Std Dev, e.g. 1.0 to 1.25, by color coding it and making a separate KDF or other probability density graph.
What I want to do with this is be able to pick out other Std Dev ranges and get back new graphs that I can turn around and use to predict outcomes in that specific Std Dev.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized data that looks like a shotgun blast:
Code used to produce it:
data = pd.read_csv("Data.csv")
sns.jointplot(data.x,data.y, space=0.2, size=10, ratio=2, kind="reg");
What I want to achieve here looks like what I have marked up below:
I kind of know how to do this in RStudio using RidgePlot-type functions, but I'm at a loss here in Python, even while using Seaborn. Any/All help appreciated!

The following code might point you in the right directly, you can tweak the appearance of the plot as you please from there.
tips = sns.load_dataset("tips")
g = sns.jointplot(x="total_bill", y="tip", data=tips)
top_lim = 4
bottom_lim = 2
temp = tips.loc[(tips.tip>=bottom_lim)&(tips.tip<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the
# kde might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.total_bill, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)

Related

How to display centroids for categorical variables instead of arrows using function ggord?

I really can’t figure out how to display just the centroids for my categorical variables using the function ggord. If anybody could help me, that would be great.
Here is an example of what I’m trying to achieve using the dune data set:
library(vegan)
library (ggord)
library(ggplot2)
ord <- rda(dune~Moisture+ Management+A1,dune.env)
#first plot
plot(ord)
# second plot
ggord(ord)
#I tried to add the centroids, but somehow the whole plot seems to be differently scaled?
centroids<-ord$CCA$centroids
ggord(ord)+geom_point(aes(centroids[,1],centroids[,2]),pch=4,cex=5,col="black",data=as.data.frame(centroids))
In the first plot only the centroids (instead of arrows) for moisture and management are displayed. In the ggord plot every variable is displayed with an arrow.
And why do these plots look so different? The scales of the axes is totally different?
Something like this could work - you can use the var_sub argument to retain specific predictors (e.g., continuous), then just plot others on top of the ggord object.
library(vegan)
library(ggord)
library(ggplot2)
data(dune)
data(dune.env)
ord <- rda(dune~Moisture+ Management+A1,dune.env)
# get centroids for factors
centroids <- data.frame(ord$CCA$centroids)
centroids$labs <- row.names(centroids)
# retain only continuous predictors, then add factor centroids
ggord(ord, var_sub = 'A1') +
geom_text(data = centroids, aes(x = RDA1, y = RDA2, label = labs))

Why the point size using sns.lmplot is different when I used plt.scatter?

I want to do a scatterplot according x and y variables, and the points size depend of a numeric variable and the color of every point depend of a categorical variable.
First, I was trying this with plt.scatter:
Graph 1
After, I tried this using lmplot but the point size is different in relation to the first graph.
I think the two graphs should be equals. Why not?
The point size is different in every graph.
Graph 2
Your question is no so much descriptive but i guess you want to control the size of the marker. Here is more documentation
Here is the start point for you.
A numeric variable can also be assigned to size to apply a semantic mapping to the areas of the points:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", size="size")
For seaborn scatterplot:
df = sns.load_dataset("anscombe")
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df)
And to change the size of the points you use the s parameter.
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df, s=100)

Zooming a pherical projection in matplotlib

I need to display a catalogue of galaxies projected on the sky. Not all the sky is relevant here, so I need to center an zoom on the relevant part. I am OK with more or less any projection, like Lambert, Mollweide, etc. Here are mock data and code sample, using Mollweide:
# Generating mock data
np.random.seed(1234)
(RA,Dec)=(np.random.rand(100)*60 for _ in range(2))
# Creating projection
projection='mollweide'
fig = plt.figure(figsize=(20, 10));
ax = fig.add_subplot(111, projection=projection);
ax.scatter(np.radians(RA),np.radians(Dec));
# Creating axes
xtick_labels = ["$150^{\circ}$", "$120^{\circ}$", "$90^{\circ}$", "$60^{\circ}$", "$30^{\circ}$", "$0^{\circ}$",
"$330^{\circ}$", "$300^{\circ}$", "$270^{\circ}$", "$240^{\circ}$", "$210^{\circ}$"]
labels = ax.set_xticklabels(xtick_labels, fontsize=15);
ytick_labels = ["$-75^{\circ}$", "$-60^{\circ}$", "$-45^{\circ}$", "$-30^{\circ}$", "$-15^{\circ}$",
"$0^{\circ}$","$15^{\circ}$", "$30^{\circ}$", "$45^{\circ}$", "$60^{\circ}$",
"$75^{\circ}$", "$90^{\circ}$"]
ax.set_yticklabels(ytick_labels,fontsize=15);
ax.set_xlabel("RA");
ax.xaxis.label.set_fontsize(20);
ax.set_ylabel("Dec");
ax.yaxis.label.set_fontsize(20);
ax.grid(True);
The result is the following:
I have tried various set_whateverlim, set_extent, clip_box and so on, as well as importing cartopy and passing ccrs.LambertConformal(central_longitude=...,central_latitude=...) as arguments. I was unable to get a result.
Furthermore, I would like to shift RA tick labels down, as they are difficult to read with real data. Unfortunately, ax.tick_params(pad=-5) doesn't do anything.

How do I create a bar chart that starts and ends in a certain range

I created a computer model (just for fun) to predict soccer match result. I ran a computer simulation to predict how many points that a team will gain. I get a list of simulation result for each team.
I want to plot something like confidence interval, but using bar chart.
I considered the following option:
I considered using matplotlib's candlestick, but this is not Forex price.
I also considered using matplotlib's errorbar, especially since it turns out I can mashes graphbar + errorbar, but it's not really what I am aiming for. I am actually aiming for something like Nate Silver's 538 election prediction result.
Nate Silver's is too complex, he colored the distribution and vary the size of the percentage. I just want a simple bar chart that plots on a certain range.
I don't want to resort to plot bar stacking like shown here
Matplotlib's barh (or bar) is probably suitable for this:
import numpy as np
import matplotlib.pylab as pl
x_mean = np.array([1, 3, 6 ])
x_std = np.array([0.3, 1, 0.7])
y = np.array([0, 1, 2 ])
pl.figure()
pl.barh(y, width=2*x_std, left=x_mean-x_std)
The bars have a horizontal width of 2*x_std and start at x_mean-x_std, so the center denotes the mean value.
It's not very pretty (yet), but highly customizable:

Multiplot with matplotlib without knowing the number of plots before running

I have a problem with Matplotlib's subplots. I do not know the number of subplots I want to plot beforehand, but I know that I want them in two rows. so I cannot use
plt.subplot(212)
because I don't know the number that I should provide.
It should look like this:
Right now, I plot all the plots into a folder and put them together with illustrator, but there has to be a better way with Matplotlib. I can provide my code if I was unclear somewhere.
My understanding is that you only know the number of plots at runtime and hence are struggling with the shorthand syntax, e.g.:
plt.subplot(121)
Thankfully, to save you having to do some awkward math to figure out this number programatically, there is another interface which allows you to use the form:
plt.subplot(n_cols, n_rows, plot_num)
So in your case, given you want n plots, you can do:
n_plots = 5 # (or however many you programatically figure out you need)
n_cols = 2
n_rows = (n_plots + 1) // n_cols
for plot_num in range(n_plots):
ax = plt.subplot(n_cols, n_rows, plot_num)
# ... do some plotting
Alternatively, there is also a slightly more pythonic interface which you may wish to be aware of:
fig, subplots = plt.subplots(n_cols, n_rows)
for ax in subplots:
# ... do some plotting
(Notice that this was subplots() not the plain subplot()). Although I must admit, I have never used this latter interface.
HTH