Figures names in Pandas Boxplots - pandas

I created 2 boxplots using pandas.
Then each figure gets referenced with plt.gcf()
When trying to show the plots, only the last boxplot gets shown. Its like fig1 is getting overwritten.
What is the correct way of showing both boxplots?
This is the sample code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dates = pd.date_range('20000101', periods=10)
df = pd.DataFrame(index=dates)
df['A'] = np.cumsum(np.random.randn(10))
df['B'] = np.random.randint(-1,2,size=10)
df['C'] = range(1,11)
df['D'] = range(12,22)
# first figure
ax_boxplt1 = df[['A','B']].boxplot()
fig1 = plt.gcf()
# second figure
ax_boxplt2 = df[['C','D']].boxplot()
fig2 = plt.gcf()
# print figures
figures = [fig1,fig2]
for fig in figures:
print(fig)

Create a figure with two axes and plot to each of them separately
fig, axes = plt.subplots(2)
dates = pd.date_range('20000101', periods=10)
df = pd.DataFrame(index=dates)
df['A'] = np.cumsum(np.random.randn(10))
df['B'] = np.random.randint(-1,2,size=10)
df['C'] = range(1,11)
df['D'] = range(12,22)
# first figure
df[['A','B']].boxplot(ax=axes[0]) # Added `ax` parameter
# second figure
df[['C','D']].boxplot(ax=axes[1]) # Added `ax` parameter
plt.show()

In order to get two figures, define the figure before plotting to it. You can use a number enumerate the figures.
plt.figure(1)
# do something with the first figure
plt.figure(2)
# do something with the second figure
Complete example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dates = pd.date_range('20000101', periods=10)
df = pd.DataFrame(index=dates)
df['A'] = np.cumsum(np.random.randn(10))
df['B'] = np.random.randint(-1,2,size=10)
df['C'] = range(1,11)
df['D'] = range(12,22)
# first figure
fig1=plt.figure(1)
ax_boxplt1 = df[['A','B']].boxplot()
# second figure
fig2=plt.figure(2)
ax_boxplt2 = df[['C','D']].boxplot()
plt.show()

Related

Building a histogram

How can a distribution histogram similar to this one be constructed based on the data from the table?
enter image description here
enter image description here
Code python:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Data.xlsx')
print(df)
df.plot.hist(df)
plt.show()
It isn't clear exactly what the x and y axes of your desired plot are. Hopefully this will get you started. Sometimes trying to comeup with a MRE will help you solve your own problem.
import random
import pandas as pd
import matplotlib.pyplot as plt
#######################################
# generate some random data for a MWE #
#######################################
random.seed(22)
data = [random.randint(0, 100) for _ in range(0, 10)]
data = pd.Series(sorted(data))
freqs = [random.uniform(0, 1) for _ in range(0, 10)]
freqs = sorted(freqs)
freqs = pd.Series(freqs)
df = pd.DataFrame()
df['data'] = data
df['frequencies'] = freqs
###############################################
# Desired bar plot using pandas built in plot #
###############################################
df.plot(x='data', y='frequencies', kind='bar')
plt.show()

How to set x axis according to the numbers in the DATAFRAME

i am using Matplotlib to show graph of some information that i get from the users,
i want to show it as:axis x will be by the ID of the users and axis y will be by the Winning time that whey have..
I dont understand how can i put the x axis index as the ID of my users.
my code:
import matplotlib.pyplot as plt
import matplotlib,pylab as pylab
import pandas as pd
import numpy as np
#df = pd.read_csv('Players.csv')
df = pd.read_json('Players.json')
# df.groupby('ID').sum()['Win']
axisx = df.groupby('ID').sum()['Win'].keys()
axisy = df.groupby('ID').sum()['Win'].values
fig = pylab.gcf()
# fig.canvas.set_window_title('4 In A Row Statistic')
# img = plt.imread("Oi.jpeg")
# plt.imshow(img)
fig, ax = plt.subplots()
ax.set_xticklabels(axisx.to_list())
plt.title('Game Statistic',fontsize=20,color='r')
plt.xlabel('ID Players',color='r')
plt.ylabel('Wins',color='r')
x = np.arange(len(axisx))
rects = ax.bar(x, axisy, width=0.1)
plt.show()
use plt.xticks(array_of_id). xticks can set the current tick locations and labels of the x-axis.

What kind of plot this is? [duplicate]

I'm trying to get the errorbars to show at the confidence interval's limits, and not in the center.
What I want is this:
but what I'm getting is this:
To plot the bar chart I used this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
df1 = df.T
df1.columns = ['1992', '1993','1994','1995']
a = df1.describe()
means = a.loc['mean'].values.tolist()
stdevs = a.loc['std'].values.tolist()
counts = a.loc['count'].values.tolist()
index = np.arange(len(df1.columns))
CI = []
for i in range(len(means)):
CIval = 1.96*stdevs[i]/(counts[i]**(0.5))
CI.append(CIval)
#print(means, CI)
plt.figure()
fig, ax = plt.subplots(figsize=(10,10))
ax.set_xticks(index)
ax.set_xticklabels(df1.columns)
plt.bar(index, means, xerr = 0.1, yerr=CI)
plt.tight_layout()
plt.show()
The error bars are showing as expected. You have set a 0.1 value for the x error, however in your expected result image, there is no x errorbar so we can remove that. Secondly, we can increase the capsize of your error bars so that they are actually visible by using the capsize= in the call to plt.bar():
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
df1 = df.T
df1.columns = ['1992', '1993','1994','1995']
a = df1.describe()
means = a.loc['mean'].values.tolist()
stdevs = a.loc['std'].values.tolist()
counts = a.loc['count'].values.tolist()
index = np.arange(len(df1.columns))
CI = []
for i in range(len(means)):
CIval = 1.96*stdevs[i]/(counts[i]**(0.5))
CI.append(CIval)
fig, ax = plt.subplots(figsize=(10,10))
ax.set_xticks(index)
ax.set_xticklabels(df1.columns)
plt.bar(index, means, yerr=CI, capsize=10)
plt.tight_layout()
plt.show()

error bars at the limits in matplotlib barchart

I'm trying to get the errorbars to show at the confidence interval's limits, and not in the center.
What I want is this:
but what I'm getting is this:
To plot the bar chart I used this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
df1 = df.T
df1.columns = ['1992', '1993','1994','1995']
a = df1.describe()
means = a.loc['mean'].values.tolist()
stdevs = a.loc['std'].values.tolist()
counts = a.loc['count'].values.tolist()
index = np.arange(len(df1.columns))
CI = []
for i in range(len(means)):
CIval = 1.96*stdevs[i]/(counts[i]**(0.5))
CI.append(CIval)
#print(means, CI)
plt.figure()
fig, ax = plt.subplots(figsize=(10,10))
ax.set_xticks(index)
ax.set_xticklabels(df1.columns)
plt.bar(index, means, xerr = 0.1, yerr=CI)
plt.tight_layout()
plt.show()
The error bars are showing as expected. You have set a 0.1 value for the x error, however in your expected result image, there is no x errorbar so we can remove that. Secondly, we can increase the capsize of your error bars so that they are actually visible by using the capsize= in the call to plt.bar():
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650),
np.random.normal(43000,100000,3650),
np.random.normal(43500,140000,3650),
np.random.normal(48000,70000,3650)],
index=[1992,1993,1994,1995])
df1 = df.T
df1.columns = ['1992', '1993','1994','1995']
a = df1.describe()
means = a.loc['mean'].values.tolist()
stdevs = a.loc['std'].values.tolist()
counts = a.loc['count'].values.tolist()
index = np.arange(len(df1.columns))
CI = []
for i in range(len(means)):
CIval = 1.96*stdevs[i]/(counts[i]**(0.5))
CI.append(CIval)
fig, ax = plt.subplots(figsize=(10,10))
ax.set_xticks(index)
ax.set_xticklabels(df1.columns)
plt.bar(index, means, yerr=CI, capsize=10)
plt.tight_layout()
plt.show()

Arrange two plots horizontally

As an exercise, I'm reproducing a plot from The Economist with matplotlib
So far, I can generate a random data and produce two plots independently. I'm struggling now with putting them next to each other horizontally.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
df1.plot.bar(stacked=True)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
plt.figure()
fig, ax1 = plt.subplots()
ax1.plot(df2['avg_hour'])
ax2 = ax1.twinx()
ax2.bar(left=range(2009,2016),height=df2['usage'])
plt.show()
You should try using subplots. First you create a figure by plt.figure(). Then add one subplot(121) where 1 is number of rows, 2 is number of columns and last 1 is your first plot. Then you plot the first dataframe, note that you should use the created axis ax1. Then add the second subplot(122) and repeat for the second dataframe. I changed your axis ax2 to ax3 since now you have three axis on one figure. The code below produces what I believe you are looking for. You can then work on aesthetics of each plot separately.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
ax1 = fig.add_subplot(121)
df1.plot.bar(stacked=True,ax=ax1)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
ax2 = fig.add_subplot(122)
ax2.plot(df2['avg_hour'])
ax3 = ax2.twinx()
ax3.bar(left=range(2009,2016),height=df2['usage'])
plt.show()