How to display centroids for categorical variables instead of arrows using function ggord? - ggplot2

I really can’t figure out how to display just the centroids for my categorical variables using the function ggord. If anybody could help me, that would be great.
Here is an example of what I’m trying to achieve using the dune data set:
library(vegan)
library (ggord)
library(ggplot2)
ord <- rda(dune~Moisture+ Management+A1,dune.env)
#first plot
plot(ord)
# second plot
ggord(ord)
#I tried to add the centroids, but somehow the whole plot seems to be differently scaled?
centroids<-ord$CCA$centroids
ggord(ord)+geom_point(aes(centroids[,1],centroids[,2]),pch=4,cex=5,col="black",data=as.data.frame(centroids))
In the first plot only the centroids (instead of arrows) for moisture and management are displayed. In the ggord plot every variable is displayed with an arrow.
And why do these plots look so different? The scales of the axes is totally different?

Something like this could work - you can use the var_sub argument to retain specific predictors (e.g., continuous), then just plot others on top of the ggord object.
library(vegan)
library(ggord)
library(ggplot2)
data(dune)
data(dune.env)
ord <- rda(dune~Moisture+ Management+A1,dune.env)
# get centroids for factors
centroids <- data.frame(ord$CCA$centroids)
centroids$labs <- row.names(centroids)
# retain only continuous predictors, then add factor centroids
ggord(ord, var_sub = 'A1') +
geom_text(data = centroids, aes(x = RDA1, y = RDA2, label = labs))

Related

Zooming a pherical projection in matplotlib

I need to display a catalogue of galaxies projected on the sky. Not all the sky is relevant here, so I need to center an zoom on the relevant part. I am OK with more or less any projection, like Lambert, Mollweide, etc. Here are mock data and code sample, using Mollweide:
# Generating mock data
np.random.seed(1234)
(RA,Dec)=(np.random.rand(100)*60 for _ in range(2))
# Creating projection
projection='mollweide'
fig = plt.figure(figsize=(20, 10));
ax = fig.add_subplot(111, projection=projection);
ax.scatter(np.radians(RA),np.radians(Dec));
# Creating axes
xtick_labels = ["$150^{\circ}$", "$120^{\circ}$", "$90^{\circ}$", "$60^{\circ}$", "$30^{\circ}$", "$0^{\circ}$",
"$330^{\circ}$", "$300^{\circ}$", "$270^{\circ}$", "$240^{\circ}$", "$210^{\circ}$"]
labels = ax.set_xticklabels(xtick_labels, fontsize=15);
ytick_labels = ["$-75^{\circ}$", "$-60^{\circ}$", "$-45^{\circ}$", "$-30^{\circ}$", "$-15^{\circ}$",
"$0^{\circ}$","$15^{\circ}$", "$30^{\circ}$", "$45^{\circ}$", "$60^{\circ}$",
"$75^{\circ}$", "$90^{\circ}$"]
ax.set_yticklabels(ytick_labels,fontsize=15);
ax.set_xlabel("RA");
ax.xaxis.label.set_fontsize(20);
ax.set_ylabel("Dec");
ax.yaxis.label.set_fontsize(20);
ax.grid(True);
The result is the following:
I have tried various set_whateverlim, set_extent, clip_box and so on, as well as importing cartopy and passing ccrs.LambertConformal(central_longitude=...,central_latitude=...) as arguments. I was unable to get a result.
Furthermore, I would like to shift RA tick labels down, as they are difficult to read with real data. Unfortunately, ax.tick_params(pad=-5) doesn't do anything.

How to display all the lables present on x and y axis in matplotlib [duplicate]

I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow.
The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here:
fig = plt.figure(figsize=(15,8))
ax1 = fig.add_subplot(111)
plt.imshow(df.corr(), cmap='hot', interpolation='nearest')
plt.colorbar()
labels = df.columns.tolist()
ax1.set_xticklabels(labels,rotation=90, fontsize=10)
ax1.set_yticklabels(labels,fontsize=10)
plt.show()
successful heatmap
Later, I used get_dummies() on my categorical variable, like so:
df = pd.get_dummies(df, columns = ['sex'])
resulting correlation matrix
So, if I reuse the code from before to generate a nice heatmap, it should be fine, right? Wrong!
What dumpster fire is this?
So my question is, where did my labels go, and how do I get them back?!
Thanks!
To get your labels back, you can force matplotlib to use enough xticks so that all your labels can be shown. This can be done by adding
ax1.set_xticks(np.arange(len(labels)))
ax1.set_yticks(np.arange(len(labels)))
before your statements ax1.set_xticklabels(labels,rotation=90, fontsize=10) and ax1.set_yticklabels(labels,fontsize=10).
This results in the following plot:

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Visualizing Data, Tracking Specific SD Values

BLUF: I want to track a specific Std Dev, e.g. 1.0 to 1.25, by color coding it and making a separate KDF or other probability density graph.
What I want to do with this is be able to pick out other Std Dev ranges and get back new graphs that I can turn around and use to predict outcomes in that specific Std Dev.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized data that looks like a shotgun blast:
Code used to produce it:
data = pd.read_csv("Data.csv")
sns.jointplot(data.x,data.y, space=0.2, size=10, ratio=2, kind="reg");
What I want to achieve here looks like what I have marked up below:
I kind of know how to do this in RStudio using RidgePlot-type functions, but I'm at a loss here in Python, even while using Seaborn. Any/All help appreciated!
The following code might point you in the right directly, you can tweak the appearance of the plot as you please from there.
tips = sns.load_dataset("tips")
g = sns.jointplot(x="total_bill", y="tip", data=tips)
top_lim = 4
bottom_lim = 2
temp = tips.loc[(tips.tip>=bottom_lim)&(tips.tip<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the
# kde might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.total_bill, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)

Add a new axis to the right/left/top-right of an axis

How do you add an axis to the outside of another axis, keeping it within the figure as a whole? legend and colorbar both have this capability, but implemented in rather complicated (and for me, hard to reproduce) ways.
You can use the subplots command to achieve this, this can be as simple as py.subplot(2,2,1) where the first two numbers describe the geometry of the plots (2x2) and the third is the current plot number. In general it is better to be explicit as in the following example
import pylab as py
# Make some data
x = py.linspace(0,10,1000)
cos_x = py.cos(x)
sin_x = py.sin(x)
# Initiate a figure, there are other options in addition to figsize
fig = py.figure(figsize=(6,6))
# Plot the first set of data on ax1
ax1 = fig.add_subplot(2,1,1)
ax1.plot(x,sin_x)
# Plot the second set of data on ax2
ax2 = fig.add_subplot(2,1,2)
ax2.plot(x,cos_x)
# This final line can be used to adjust the subplots, if uncommentted it will remove all white space
#fig.subplots_adjust(left=0.13, right=0.9, top=0.9, bottom=0.12,hspace=0.0,wspace=0.0)
Notice that this means things like py.xlabel may not work as expected since you have two axis. Instead you need to specify ax1.set_xlabel("..") this makes the code easier to read.
More examples can be found here.