I'd like to create a categorical plot of two pandas DataFrame columns a and b in the same figure with shared x and different y axis:
import pandas as pd
import seaborn as sns
example = [
('exp1','f0', 0.25, 2),
('exp1','f1', 0.5, 3),
('exp1','f2', 0.75, 4),
('exp2','f1', -0.25, 1),
('exp2','f2', 1, 2),
('exp2','f3', 0, 3)
]
df = pd.DataFrame(example, columns=['exp', 'split', 'a', 'b'])
mean_df = df.groupby('exp')['a'].mean()
g = sns.catplot(x='exp', y='a', data=df, jitter=False)
ax2 = plt.twinx()
sns.catplot(x='exp', y='b', data=df, jitter=False, ax=ax2)
In this implementation I have the problem that the colors are different for categories (x-values), not for the columns. Can I sole this or do I have to change the data structure?
I would also like to connect the means of the categorical values like in the image like this:
You may want to melt your data first:
data = df.melt(id_vars='exp', value_vars=['a','b'])
fig, ax = plt.subplots()
sns.scatterplot(data=data,
x='exp',
hue='variable',
y='value',
ax=ax)
(data.groupby(['exp','variable'])['value']
.mean()
.unstack('variable')
.plot(ax=ax, legend=False)
)
ax.set_xlim(-0.5, 1.5);
Output:
df = pd.DataFrame(example, columns=['exp', 'split', 'a', 'b'])
mean_df = df.groupby('exp').mean().reset_index()
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.scatterplot(x='exp', y='a', data=df, color='C0', ax=ax1)
sns.scatterplot(x='exp', y='b', data=df, color='C1', ax=ax2)
sns.lineplot(x='exp',y='a', data=mean_df, color='C0', ax=ax1)
sns.lineplot(x='exp',y='b', data=mean_df, color='C1', ax=ax2)
Related
The first image is the figure I'm trying to reproduce, and the second image is the data I have. Does anyone have a clean way to do this with pandas or matplotlib?
Just transpose the DataFrame and use df.plot with the stacked flag set to true:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'squad': [0.6616, 0.1245, 0.0950],
'quac': [0.83, 0.065, 0.0176],
'quoref': [0.504, 0.340364, 0.1067]})
# Transpose
plot_df = df.T
# plot
ax = plot_df.plot(kind='bar', stacked=True, rot='horizontal')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
ax.set_ylabel("% of Questions")
plt.tight_layout()
plt.show()
You can try this:
data = {'squad':[0.661669, 0.127516, 0.095005],
'quac':[0.930514, 0.065951, 0.017680],
'quoref': [0.504963, 0.340364, 0.106700]}
df = pd.DataFrame(data)
bars_1 = df.iloc[0]
bars_2 = df.iloc[1]
bars_3 = df.iloc[2]
# Heights of bars_1 + bars_2
bars_1_to_2 = np.add(bars_1, bars_2).tolist()
# The position of the bars on the x-axis
r = [0, 1, 2]
plt.figure(figsize = (7, 7))
plt.bar(r, bars_1, color = 'lightgrey', edgecolor = 'white')
plt.bar(r, bars_2, bottom = bars_1, color = 'darkgrey', edgecolor = 'white')
plt.bar(r, bars_3, bottom = bars_1_to_2, color = 'dimgrey', edgecolor = 'white')
plt.yticks(np.arange(0, 1.1, 0.1))
plt.xticks(ticks = r, labels = df.columns)
plt.ylabel('% of Questions')
plt.show()
I want 3 rows of subplots each of different widths, but which all share the same X-axis, such as in the rough mock-up below. How can I do this? Can I use sharex=True even in GridSpec-adjusted plots?
You can place the axes by hand, or another method is to use an inset_axes:
import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(3, 1, constrained_layout=True, sharex=True, sharey=True)
ylim=[-3, 3]
axs[2].plot(np.random.randn(500))
axs[2].set_ylim(ylim)
xlim = axs[2].get_xlim()
ax0 = axs[0].inset_axes([300, ylim[0], xlim[1]-300, ylim[1]-ylim[0]], transform=axs[0].transData)
ax0.set_ylim(ylim)
ax0.set_xlim([300, xlim[1]])
axs[0].axis('off')
ax0.plot(np.arange(300, 500), np.random.randn(200))
ax1 = axs[1].inset_axes([150, ylim[0], xlim[1] - 150, ylim[1]-ylim[0]], transform=axs[1].transData)
ax1.set_ylim(ylim)
ax1.set_xlim([150, xlim[1]])
axs[1].axis('off')
ax1.plot(np.arange(150, 500), np.random.randn(350))
plt.show()
You can pass which axes to use as reference for sharing axes when you create your subplot
fig = plt.figure()
gs = matplotlib.gridspec.GridSpec(3,3, figure=fig)
ax1 = fig.add_subplot(gs[0,2])
ax2 = fig.add_subplot(gs[1,1:], sharex=ax1)
ax3 = fig.add_subplot(gs[2,:], sharex=ax1)
ax1.plot([1,5,0])
myDict = {'a':[3,13,18,16,19,9,13,15,0,2],\
'b':[23,14,18,24,19,9,14,13,21,22],\
'c':[38,17,12,15,39,38,23,19,16,16]}
df = pd.DataFrame(myDict)
df_melted = df.melt(value_vars=['a','b','c'])
fig,ax1 = plt.subplots()
sns.barplot(x='variable',y='value',data=df_melted,capsize=0.1,ax=ax1,order=['b','a','c'])
plt.
plt.show()
plt.close()
Use a lineplot but first, you need to keep the same order because lineplot does not have the order argument as barplot. The steps are:
Create a copy of the dataframe
Set variable to be categorical with the order of ['b','a','c']
lineplot in the same ax
The code would be:
order = ['b', 'a', 'c']
df_2 = df_melted.copy()
df_2['variable'] = pd.Categorical(df_2['variable'], order)
df_2.sort_values('variable', inplace=True)
#plot
fig, ax1 = plt.subplots()
sns.barplot(x='variable', y='value', data=df_melted, capsize=0.1, ax=ax1,
order=order)
sns.lineplot(x='variable', y='value', data=df_2,
ax=ax1, color='r', marker='+', linewidth=5, ci=None)
plt.show()
That will produce:
I've got a large amount of astronomical data that I need to plot in a scatterplot. I've binned the data according to distance, and I want to plot 4 scatterplots, side by side.
For the purposes of asking this question, I've constructed a MWE based, obviously with different data, on what I've got so far:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky', 'Jim', 'Lee', 'Rob', 'Dave',
'Jane', 'Bronwyn', 'Karen', 'Liz', 'Claire', 'Chris', 'Jan', 'Ruby'],
'Age':[28,34,29,42,14,16,75,68,
27,3,2,19,17,32,71,45],
'Weight':[60,75,73,82,54,55,98,82,45,9,8,47,54,62,67,67]}
stages = ['Toddler', 'Teen', ' Young Adult', 'Adult']
ages = [0,4,20,40,100]
df = pd.DataFrame(data)
df['binned'] = pd.cut(df['Age'], bins=ages, labels=stages)
fig=plt.figure()
fig.subplots_adjust(hspace=0)
fig.subplots_adjust(wspace=0)
gridsize = 1,4
ax1 = plt.subplot2grid(gridsize, (0,0))
ax1.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax1.set_ylabel('Weight, kg', fontsize=20)
ax1.set_xlabel('Name', fontsize=20)
ax2 = plt.subplot2grid(gridsize, (0,1), sharey=ax1, sharex = ax1)
plt.setp(ax2.get_yticklabels(), visible=False)
ax2.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax2.set_xlabel('Name', fontsize=20)
ax3 = plt.subplot2grid(gridsize, (0,2), sharey=ax1, sharex = ax1)
plt.setp(ax3.get_yticklabels(), visible=False)
ax3.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax3.set_xlabel('Name', fontsize=20)
ax4 = plt.subplot2grid(gridsize, (0,3), sharey=ax1, sharex = ax1)
plt.setp(ax4.get_yticklabels(), visible=False)
ax4.scatter(df['Name'], df['Weight'], alpha = 0.5)
ax4.set_xlabel('Name', fontsize=20)
This plots four graphs as expected:
but how do I get each graph to plot only the data from one of each of the bins? In other words, how do I plot just one of the bins?
I'm not worried about the scrunching up of the names on the x axis, those are just for this MWE. They'll be numbers in my actual plots.
Just for clarification, my actual data is binned like
sources['z bins']=pd.cut(sources['z'], [0,1,2,3, max(z)],
labels = ['z < 1', '1 < z < 2', '2 < z < 3', 'z > 3'])
What if you grouped the dataframe by binned and then plotted each group?
For example:
fig=plt.figure()
fig.subplots_adjust(hspace=0)
fig.subplots_adjust(wspace=0)
gridsize = 1,4
for i, (name, frame) in enumerate(df.groupby('binned')):
ax = plt.subplot2grid(gridsize, (0,i))
ax.scatter(frame['Name'], frame['Weight'], alpha = 0.5)
ax.set_xlabel(name, fontsize=20)
I realize you will likely want to clean up the labels a bit, but this at least puts the different bins on a different axes object.
You can iterate over a groupby object and return the name of the group and the dataframe of that group. Here I am using enumerate in order to increment the axes object
Alternatively if you do not want to use a for loop you can access each group with the get_group method of a groupby object.
grouped = df.groupby('binned')
ax1 = plt.subplot2grid(gridsize, (0,0))
ax1.scatter(grouped.get_group('Toddler')['Name'],
grouped.get_group('Toddler')['Weight'],
alpha = 0.5)
ax1.set_ylabel('Weight, kg', fontsize=20)
ax1.set_xlabel('Name', fontsize=20)
I have a matplotlib pie chart in Ipython notebook with a plt.text series table posted next to it. The problem is the table is formated as series output and not as a nice table. What am I doing wrong?
sumByGroup = df['dollar charge'].groupby(df['location']).sum().astype('int')
sumByGroup.plot(kind='pie', title='DOLLARS', autopct='%1.1f%%')
plt.axis('off')
plt.text(2, -0.5, sumByGroup, size=12)
I think the problem is that you're calling groupby on df['dollar change'] rather than the df as a whole. Try this instead,
sumByGroup = df.groupby(df['location']).sum().astype('int')
sumByGroup.plot(y='dollar charge', kind='pie', title='DOLLARS', autopct='%1.1f%%')
plt.axis('off')
plt.text(2, -0.5, sumByGroup, size=12)
Full working example with made up data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 20
locations = ['MD', 'DC', 'VA', 'NC', 'NY']
df = pd.DataFrame({'dollar charge': np.random.randint(28, 53, n),
'location': np.random.choice(locations, n),
'Col A': np.random.randint(-5, 5, n),
'Col B': np.random.randint(-5, 5, n)})
sumByGroup = df.groupby(df['location']).sum()
fig, ax = plt.subplots()
sumByGroup.plot(y='dollar charge', kind='pie', title='DOLLARS',
autopct='%1.1f%%', legend=False, ax=ax)
ax.axis('off')
ax.text(2, -0.5, sumByGroup, size=12)
ax.set_aspect('equal')