Stacked hue histogram - matplotlib

I don't have the reputation to add inline images I'm sorry.
This is the code I found:
bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)
g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)
g.map(plt.hist, 'Principal', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
Output:
I want to do something similar with some data I have:
bins = np.linspace(df.overall.min(), df.overall.max(), 10)
g = sns.FacetGrid(df, col="player_positions", hue="preferred_foot", palette="Set1", col_wrap=4)
g.map(plt.hist, 'overall', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
The hue "preferred_foot" is just left and right.
My output:
I am not sure why I can't see the left values on the plot
df['preferred_foot'].value_counts()
Right 13960
Left 4318

I am fairly sure those are not stacked histograms, but just two histograms one behind the other. I believe your "left" red bars are simply hidden behind the "right" blue bars.
You could try adding some alpha=0.5 or changing the order of the hues (add hue_order=['Right','Left'] to the call to FacetGrid.

Related

ggplot2 - wrap data around legend in custom position

When placing a legend in a custom position (using legend.position = c(x, y)) in a ggplot, is it possible to format the legend so that it does not overlay the data, and instead, the datapoints wrap around it?
In this example, would it be possible to, say, have ggplot insert extra space in the plot, so that datapoints are not obscured by the legend (without changing the legend.position)?
Thanks!
library(tidyverse)
data(mtcars)
ggplot(data = mtcars, aes(x = wt, y = hp))+
geom_point(aes(color = mpg))+
theme(legend.direction = "horizontal",
legend.position = c(0.5, 0.9))
An inelegant solution is to add plot.title(element_text(margin = margin(a, b, c, d))) where a, b, c and d are padding values for top, right, bottom, left, respectively, and adjust the c value until there is sufficient space. Let me know if you come up with a better solution!

How to mirror the bars

I have two bars which I want to mirror. I have the following code
bar1 = df['nt'].value_counts().plot.barh()
bar2 = df1['nt'].value_counts().plot.barh()
bar1.set_xlim(bar1.get_xlim()[::-1])
# bar1.yaxis.tick_right()
But somehow not only the bar1 flips to the left(third line), but also the bar2. The same happening with the commented 4th line. Why is that? How to do it right then?
df...plot.barh()doesn't return bars nor a barplot. It returns theaxwhich indicates the subplot where the barplot was added. As both barplots are created onto the same subplot,set_xlim` etc. will act on that same subplot. This blogpost might be helpful.
To get two barplots, one from the left and one from the right, you could create a "twin" y -axis and then drawing one bar plot using the lower x-axis and the other user the upper x-axis. To make things clearer, the tick labels can be colored the same as the bars. To avoid overlapping bars, the x limits should be at least the maximum of the sum of the two value_counts.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'nt': np.random.choice([*'abcdefhij'], 50)})
df1 = pd.DataFrame({'nt': np.random.choice([*'abcdefhij'], 50)})
max_sum_value_counts = df.append(df1).value_counts().max()
fig, ax = plt.subplots(figsize=(12, 5))
df['nt'].value_counts(sort=False).sort_index().plot.barh(ax=ax, color='purple')
ax.set_xlim(0, max_sum_value_counts + 1)
ax.tick_params(labelcolor='purple')
ax1 = ax.twiny()
df1['nt'].value_counts(sort=False).sort_index().plot.barh(ax=ax1, color='crimson')
ax1.set_xlim(max_sum_value_counts + 1, 0)
ax1.tick_params(labelcolor='crimson', labelright=True, labelleft=False)
ax1.invert_yaxis()
plt.show()

fig.tight_layout() but plots still overlap

Imagine I have some dataset for wines and I find the top 5 wine producing countries:
# Find top 5 wine producing countries.
top_countries = wines_df.groupby('country').size().reset_index(name='n').sort_values('n', ascending=False)[:5]['country'].tolist()
Now that I have the values, I attempt to plot the results in 10 plots, 5 rows 2 columns.
fig = plt.figure(figsize=(16, 15))
fig.tight_layout()
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
plt.show()
Even with this result, I still have my subplots overlapping.
Am I doing something wrong? Using python3.6 with matplotlib==2.2.2
As Thomas Kühn said, you have to move tight_layout() after doing the plots, like in:
fig = plt.figure(figsize=(16, 15))
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
fig.tight_layout()
plt.show()
If it is still overlapping (this may happen in some seldom cases), you can specify the padding with:
fig.tight_layout(pad=0., w_pad=0.3, h_pad=1.0)
Where pad is the general padding, w_pad is the horizontal padding and h_pad is the vertical padding. Just try some values until your plot looks nicely. (pad=0., w_pad=.3, h_pad=.3) is a good start, if you want to have your plots as tight as possible.
Another possibility is to specify constrained_layout=True in the figure:
fig = plt.figure(figsize=(16, 15), constrained_layout=True)
Now you can delete the line fig.tight_layout().
edit:
One more thing I stumbled upon:
It seems like you are specifying your figsize so that it fits on a standard DIN A4 paper in centimeters (typical textwidth: 16cm). But figsize in matplotlib is in inches. So probably replacing the figsize with figsize=(16/2.54, 15/2.54) might be better.
I know that it is absolutely confusing that matplotlib internally uses inches as units, considering that it is mostly the scientific community and data engineers working with matplotlib (and these usually use SI units). As ImportanceOfBeingErnest pointed out, there are several discussions going on about how to implement other units than inches.

How do I add error bars on a histogram?

I've created a histogram to see the number of similar values in a list.
data = np.genfromtxt("Pendel-Messung.dat")
stdm = (np.std(data))/((700)**(1/2))
breite = 700**(1/2)
fig2 = plt.figure()
ax1 = plt.subplot(111)
ax1.set_ylim(0,150)
ax1.hist(data, bins=breite)
ax2 = ax1.twinx()
ax2.set_ylim(0,150/700)
plt.show()
I want to create error bars (the error being stdm) in the middle of each bar of the histogram. I know I can create errorbars using
plt.errorbar("something", data, yerr = stdm)
But how do I make them start in the middle of each bar? I thought of just adding breite/2, but that gives me an error.
Sorry, I'm a beginner! Thank you!
ax.hist returns the bin edges and the frequencies (n) so we can use those for x and y in the call to errorbar. Also, the bins input to hist takes either an integer for the number of bins, or a sequence of bin edges. I think you we trying to give a bin width of breite? If so, this should work (you just need to select an appropriate xmax):
n,bin_edges,patches = ax.hist(data,bins=np.arange(0,xmax,breite))
x = bin_edges[:-1]+breite/2.
ax.errorbar(x,n,yerr=stdm,linestyle='None')

How can I make this plot awesome (colours by group plus alpha value by second group)

I do have following dataframe:
I plotted it the following way:
Right now the plot looks ugly. Aside of using different font size, marker_edge_width, marker face color etc. I would like to have two colors for each protein (hum1 and hum2) and within the group the different pH values should have different intensities. What makes it more difficult is the fact that my groups do not have the same size.
Any ideas ?
P.S Such a build in feature would be really cool e.g colourby = level_one thenby level_two
fig = plt.figure(figsize=(9,9))
ax = fig.add_subplot(1,1,1)
c1 = plt.cm.Greens(np.linspace(0.5, 1, 4))
c2 = plt.cm.Blues(np.linspace(0.5, 1, 4))
colors = np.vstack((c1,c2))
gr.unstack(level=(0,1))['conc_dil'].plot(marker='o',linestyle='-',color=colors,ax=ax)
plt.legend(loc=1,bbox_to_anchor = (0,0,1.5,1),numpoints=1)
gives:
P.S This post helped me:
stacked bar plot and colours