who to plot stats.probplot in a grid? - matplotlib

I have a data frame with four columns I would like to plot the normality test for each column in a 2*2 grid, but it only plot one figure, and the else is empty.
import random
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2,2, figsize=(15, 6), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = .5, wspace=.001)
data = {'col1': [random.randrange(1, 50, 1) for i in range(1000)], 'col2': [random.randrange(1, 50, 1) for i in range(1000)],'col3':[random.randrange(1, 50, 1) for i in range(1000)]
,'col4':[random.randrange(1, 50, 1) for i in range(1000)]}
df = pd.DataFrame(data)
for ax, d in zip(axs.ravel(), df):
ax=stats.probplot(df[d], plot=plt)
#ax.set_title(str(d))
plt.show()
is there a way to construct the subplot and the stats.probplot within a loop?

In your code, you need to change the for loop to this:
for ax, d in zip(axs.ravel(), df):
stats.probplot(df[d], plot=ax)
#ax.set_titl(str(d))
plt.show()
I hope this will help you move on.

Related

Python pyplot scatter is not using colors

I am trying to plot a scatter chart with pandas and matplotlib.pylot. The dots in the graph are only using one color, while the legend is showing there are three different colors for three different groups of data.
Below is my code and a copy of screen shot. You can see that only all dots are in green color. Could anyone point me why? What did I do wrong?
Thanks a lot in advance.
import pandas as pd
import matplotlib.pyplot as plt
data = {
'x':[1,2,3,4,1,3,7,5],
'y':[10, 20, 30, 40, 20, 30, 40, 80],
'label':['A', 'A','B','B','A','C','C','A']
}
df = pd.DataFrame(data)
plt.figure(figsize=(34,8))
fig,ax = plt.subplots()
#sns.scatterplot(data=df, hue='label', x='x', y='y')
for k, d in df.groupby('label'):
ax.scatter(df['x'], df['y'], label=k)
plt.legend()
plt.show()
You need to add colors mapping. Slight modifications to your code after adding colors dictionary:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
data = {
'x':[1,2,3,4,1,3,7,5],
'y':[10, 20, 30, 40, 20, 30, 40, 80],
'label':['A', 'A','B','B','A','C','C','A']
}
df = pd.DataFrame(data)
#plt.figure(figsize=(34,8))
fig,ax = plt.subplots()
df1 = df.groupby('label')
colors = iter(cm.rainbow(np.linspace(0, 1, len(df1.groups))))
for k, d in df1:
ax.scatter(d['x'], d['y'], label=k, color=next(colors))
plt.legend()
plt.show()
outputs the scatter plot as:
Is this your desired output?

Matplotlib--scatter plot with half filled markers

Question: Using a scatter plot in matplotlib, is there a simple way get a half-filled marker?
I know half-filled markers can easily be done using a line plot, but I would like to use 'scatter' because I want to use marker size and color (i.e., alternate marker face color) to represent other data. (I believe this will be easier with a scatter plot since I want to automate making a large number of plots from a large data set.)
I can't seem to make half-filled markers properly using a scatter plot. That is to say, instead of a half-filled marker, the plot shows half of a marker. I've been using matplotlib.markers.MarkerStyle, but that seems to only get me halfway there. I'm able to get following output using the code below.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
plt.scatter(1, 1, marker=MarkerStyle('o', fillstyle='full'), edgecolors='k', s=500)
plt.scatter(2, 2, marker=MarkerStyle('o', fillstyle='left'), edgecolors='k', s=500)
plt.scatter(3, 3, marker=MarkerStyle('o', fillstyle='right'), edgecolors='k', s=500)
plt.scatter(4, 4, marker=MarkerStyle('o', fillstyle='top'), edgecolors='k', s=500)
plt.scatter(5, 5, marker=MarkerStyle('o', fillstyle='bottom'), edgecolors='k', s=500)
plt.show()
As mentioned in the comments, I don't see why you have to use plt.scatter but if you want to, you can fake a combined marker:
from matplotlib.markers import MarkerStyle
from matplotlib import pyplot as plt
#data generation
import pandas as pd
import numpy as np
np.random.seed(123)
n = 10
df = pd.DataFrame({"X": np.random.randint(1, 20, n),
"Y": np.random.randint(10, 30, n),
"S": np.random.randint(50, 500, n),
"C1": np.random.choice(["red", "blue", "green"], n),
"C2": np.random.choice(["yellow", "grey"], n)})
fig, ax = plt.subplots()
ax.scatter(df.X, df.Y, s=df.S, c=df.C1, edgecolor="black", marker=MarkerStyle("o", fillstyle="right"))
ax.scatter(df.X, df.Y, s=df.S, c=df.C2, edgecolor="black", marker=MarkerStyle("o", fillstyle="left"))
plt.show()
Sample output:
This works, of course, also for continuous data:
from matplotlib import pyplot as plt
from matplotlib.markers import MarkerStyle
import pandas as pd
import numpy as np
np.random.seed(123)
n = 10
df = pd.DataFrame({"X": np.random.randint(1, 20, n),
"Y": np.random.randint(10, 30, n),
"S": np.random.randint(100, 1000, n),
"C1": np.random.randint(1, 100, n),
"C2": np.random.random(n)})
fig, ax = plt.subplots(figsize=(10,8))
im1 = ax.scatter(df.X, df.Y, s=df.S, c=df.C1, edgecolor="black", marker=MarkerStyle("o", fillstyle="right"), cmap="autumn")
im2 = ax.scatter(df.X, df.Y, s=df.S, c=df.C2, edgecolor="black", marker=MarkerStyle("o", fillstyle="left"), cmap="winter")
cbar1 = plt.colorbar(im1, ax=ax)
cbar1.set_label("right half", rotation=90)
cbar2 = plt.colorbar(im2, ax=ax)
cbar2.set_label("left half", rotation=90)
plt.show()
Sample output:
But be reminded that plt.plot with marker definitions might be faster for large-scale datasets: The plot function will be faster for scatterplots where markers don't vary in size or color.

how to plot lines linking medians of multiple violin distributions in seaborn?

I struggle hard to succeed in plotting a dot-line between the median values (and min and max) per type of stacked violin distributions.
I tried superposing a violin plot with a seaborn.lineplot but it failed. I'm not sure with this approach that I can draw dot-lines and also link min and max of distributions of the same type. I also tried to use seaborn.lineplot but here the challenge is to plot min and max of the distribution at each x-axis value.
Here is a example dataset and the code for the violin plot in seaborn
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x=[0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8]
cate=['a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c']
y=[1.1,1.12,1.13,1.13,3.1,3.12,3.13,3.13,5.1,5.12,5.13,5.13,2.2,2.22,2.25,2.23,4.2,4.22,4.25,4.23,6.2,6.22,6.25,6.23,2.2,2.22,2.24,2.23,4.2,4.22,4.24,4.23,6.2,6.22,6.24,6.23,1.1,1.13,1.14,1.12,3.1,3.13,3.14,3.12,5.1,5.13,5.14,5.12]
my_pal =['red','green', 'purple']
df = pd.DataFrame({'x': x, 'Type': cate, 'y': y})
ax=sns.catplot(y='y', x='x',data=df, hue='Type', palette=my_pal, kind="violin",dodge =False)
sns.lineplot(y='y', x='x',data=df, hue='Type', palette=my_pal, ci=100,legend=False)
plt.show()
but it plots line only on a reduce part of the left of the plot. Is there a trick to superpose lineplot with violin plot?
For the line plot, 'x' is considered numerical. However, for the violin plot 'x' is considered categorical (positioned at 0, 1, 2, ...).
A solution is to convert 'x' to strings to have both plots consider it as categorical.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
my_pal = ['red', 'green', 'purple']
N = 40
df = pd.DataFrame({'x': np.random.randint(1, 6, N*3) * 0.2,
'y': np.random.uniform(0, 1, N*3) + np.tile([2, 4, 6], N),
'Type': np.tile(list('abc'), N)})
df['x'] = [f'{x:.1f}' for x in df['x']]
ax = sns.violinplot(y='y', x='x', data=df, hue='Type', palette=my_pal, dodge=False)
ax = sns.lineplot(y='y', x='x', data=df, hue='Type', palette=my_pal, ci=100, legend=False, ax=ax)
ax.margins(0.15) # slightly more padding for x and y axis
ax.legend(bbox_to_anchor=(1.01, 1), loc='upper left')
plt.tight_layout()
plt.show()

How to add droplines to a seaborn scatterplot?

Using the following example code in a Jupyter notebook:
import pandas as pd
import seaborn as sns
import numpy as np
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
df = pd.DataFrame(np.random.rand(5, 2), columns=['a', 'b'])
sns.set()
g = sns.relplot(data=df, x='a', y='b', kind='scatter');
g.set(xlim=(0, 1))
g.set(ylim=(0, 1));
The resulting plot shows the data points, but I would also like to have vertical drop lines and occasionally horizontal ones as well. To clarify what I mean by droplines, here is a mockup of the actual vs. the desired output:
Update: A little more complex input that makes it harder to manually draw the lines:
import pandas as pd
import seaborn as sns
import numpy as np
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
df = pd.DataFrame(np.random.rand(20, 3), columns=['a', 'b', 'c'])
df['d'] = ['apples', 'bananas', 'cherries', 'dates'] * 5
sns.set()
g = sns.relplot(data=df, x='a', y='b', hue='c', col='d', col_wrap=2, kind='scatter');
g.set(xlim=(0, 1))
g.set(ylim=(0, 1));
There are several ways to plot vertical/horizontal lines. One of the is to use hlines or vlines. This can be done using a loop for sake of ease.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(121)
fig, ax = plt.subplots()
df = pd.DataFrame(np.random.rand(5, 2), columns=['a', 'b'])
sns.set()
g = sns.relplot(data=df, x='a', y='b', kind='scatter', color='blue', ax=ax);
for x, y in zip(df['a'], df['b']):
ax.hlines(y, 0, x, color='blue')
ax.vlines(x, 0, y, color='blue')
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
plt.close(g.fig)

Arrange two plots horizontally

As an exercise, I'm reproducing a plot from The Economist with matplotlib
So far, I can generate a random data and produce two plots independently. I'm struggling now with putting them next to each other horizontally.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
df1.plot.bar(stacked=True)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
plt.figure()
fig, ax1 = plt.subplots()
ax1.plot(df2['avg_hour'])
ax2 = ax1.twinx()
ax2.bar(left=range(2009,2016),height=df2['usage'])
plt.show()
You should try using subplots. First you create a figure by plt.figure(). Then add one subplot(121) where 1 is number of rows, 2 is number of columns and last 1 is your first plot. Then you plot the first dataframe, note that you should use the created axis ax1. Then add the second subplot(122) and repeat for the second dataframe. I changed your axis ax2 to ax3 since now you have three axis on one figure. The code below produces what I believe you are looking for. You can then work on aesthetics of each plot separately.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
ax1 = fig.add_subplot(121)
df1.plot.bar(stacked=True,ax=ax1)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
ax2 = fig.add_subplot(122)
ax2.plot(df2['avg_hour'])
ax3 = ax2.twinx()
ax3.bar(left=range(2009,2016),height=df2['usage'])
plt.show()