How to output multiple graphs using sns.distplot() - matplotlib

I have 3 lists of data.
If I do:
sns.distplot(data1)
sns.distplot(data2)
sns.distplot(data3)
I'm going to get a single graph with 3 different distributions on the same graph.
I'd like to output 3 individual distributions. How do I do this without using subplots? I find that subplots are too cramped and small.
Thanks

You just need to set the figsize accordingly when creating subplots:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data1, data2, data3 = [np.random.normal(size=[100]) for _ in range(3)]
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 4))
sns.distplot(data1, ax=ax1)
sns.distplot(data2, ax=ax2)
sns.distplot(data3, ax=ax3)
Here's what I got

Related

Directly annotate matplotlib stacked bar graph [duplicate]

This question already has answers here:
Annotate bars with values on Pandas bar plots
(4 answers)
Closed 1 year ago.
I would like to create an annotation to a bar chart that compares the value of the bar to two reference values. An overlay such as shown in the picture, a kind of staff gauge, is possible, but I'm open to more elegant solutions.
The bar chart is generated with the pandas API to matplotlib (e.g. data.plot(kind="bar")), so a plus would be if the solution is playing nicely with that.
You may use smaller bars for the target and benchmark indicators. Pandas cannot annotate bars automatically, but you can simply loop over the values and use matplotlib's pyplot.annotate instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randint(5,15, size=5)
t = (a+np.random.normal(size=len(a))*2).round(2)
b = (a+np.random.normal(size=len(a))*2).round(2)
df = pd.DataFrame({"a":a, "t":t, "b":b})
fig, ax = plt.subplots()
df["a"].plot(kind='bar', ax=ax, legend=True)
df["b"].plot(kind='bar', position=0., width=0.1, color="lightblue",legend=True, ax=ax)
df["t"].plot(kind='bar', position=1., width=0.1, color="purple", legend=True, ax=ax)
for i, rows in df.iterrows():
plt.annotate(rows["a"], xy=(i, rows["a"]), rotation=0, color="C0")
plt.annotate(rows["b"], xy=(i+0.1, rows["b"]), color="lightblue", rotation=+20, ha="left")
plt.annotate(rows["t"], xy=(i-0.1, rows["t"]), color="purple", rotation=-20, ha="right")
ax.set_xlim(-1,len(df))
plt.show()
There's no direct way to annotate a bar plot (as far as I am aware) Some time ago I needed to annotate one so I wrote this, perhaps you can adapt it to your needs.
import matplotlib.pyplot as plt
import numpy as np
ax = plt.subplot(111)
ax.set_xlim(-0.2, 3.2)
ax.grid(b=True, which='major', color='k', linestyle=':', lw=.5, zorder=1)
# x,y data
x = np.arange(4)
y = np.array([5, 12, 3, 7])
# Define upper y limit leaving space for the text above the bars.
up = max(y) * .03
ax.set_ylim(0, max(y) + 3 * up)
ax.bar(x, y, align='center', width=0.2, color='g', zorder=4)
# Add text to bars
for xi, yi, l in zip(*[x, y, list(map(str, y))]):
ax.text(xi - len(l) * .02, yi + up, l,
bbox=dict(facecolor='w', edgecolor='w', alpha=.5))
ax.set_xticks(x)
ax.set_xticklabels(['text1', 'text2', 'text3', 'text4'])
ax.tick_params(axis='x', which='major', labelsize=12)
plt.show()

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

Matplotlib Bar Graph Yaxis not being set to 0 [duplicate]

My DataFrame's structure
trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')
I'm trying to plot transcode (transaction code) against amount to see the how much money is spent per transaction. I made sure to convert transcode to a categorical type as seen below.
trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]
The result I get from doing plt.scatter(trx['transcode'], trx['amount']) is
Scatter plot
While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode [1, 17, 99] instead of the entire [1, 100] range.
Thanks!
In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()
In order to optain the same in matplotlib <=2.0 one would plot against some index instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
u, inv = np.unique(df["x"], return_inverse=True)
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()
The same plot can be obtained using seaborn's stripplot:
sns.stripplot(x="x", y="y", data=df)
And a potentially nicer representation can be done via seaborn's swarmplot:
sns.swarmplot(x="x", y="y", data=df)

Combine two dataframe boxplots in a twinx figure

I want to display two Pandas dataframes within one figure as boxplots.
As each of the two dataframes has different value range, I would like to have them combined in a twinx figure.
Reduced to the minimum, I have tried the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF'))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
df1.boxplot(ax=ax1)
df2.boxplot(ax=ax2)
plt.show()
The result is expectedly not what it should look like (there should be 6 boxes on the plot, actually!)
How can I manage to have the boxplots next to each other?
I tried to set some dummy scatter points on ax1 and ax2, but this did not really help.
The best solution is to concatenate the data frames for plotting and to use a mask. In the creation of the mask, we use the dfs == dfs | dfs.isnull() to create a full matrix with True and then we query on all column names that are not 'E' or 'F'. This gives a 2D matrix that allows you to only plot the first four boxes, as the last two two are masked (so their ticks do appear at the bottom). With the inverse mask ~mask you plot the last two on their own axis and mask the first four.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint( 0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF' ))
dfs = pd.concat([df1, df2])
mask = ((dfs == dfs) | dfs.isnull()) & (dfs.columns != 'E') & (dfs.columns != 'F')
fig, ax1 = plt.subplots()
dfs[mask].boxplot()
ax2 = ax1.twinx()
dfs[~mask].boxplot()
plt.show()

Remove repeated lables in matplotlib legend [duplicate]

This question already has answers here:
Duplicate items in legend in matplotlib?
(6 answers)
Closed 6 years ago.
If you plot several lines or points with matplotlib, sometimes you might find a situation where you will have repeated lables. For example:
for i in range(5):
Y1=boatarrays[i]
Y2=cararrays[i]
ax.plot(X,Y1,color='r',label='Boats')
ax.plot(X,Y2,color='b',label='Cars')
How to only have 'Boats' and 'Cars' only appear once?
import matplotlib.pyplot as plt
#Prepare fig
fig = plt.figure()
ax = fig.add_subplot(111)
for i in range(5):
Y1=boatarrays[i]
Y2=carsarrays[i]
ax.plot(X,Y1,color='r',label='Boats')
ax.plot(X,Y2,color='b',label='Cars')
#Fix legend
hand, labl = ax.get_legend_handles_labels()
handout=[]
lablout=[]
for h,l in zip(hand,labl):
if l not in lablout:
lablout.append(l)
handout.append(h)
fig.legend(handout, lablout)
I prefer to use the numpy functions which are faster in performance and more compact writting.
import numpy as np
import matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(7.5,7.5))
X = np.arange(10)
for i in range(5):
Y1=np.random.uniform(low=0.0,high=1.0,size=(10)) #boatarrays[i]
Y2=np.random.uniform(low=0.0,high=1.0,size=(10)) #cararrays[i]
ax.plot(X,Y1,color='r',label='Boats')
ax.plot(X,Y2,color='b',label='Cars')
hand, labl = ax.get_legend_handles_labels()
plt.legend(np.unique(labl))
plt.tight_layout()
plt.show()