Edit array of axis output from pandas plot method - pandas

I'm plotting from a pandas dataframe with subplots and as a result I get a np.array with a number of axis.
array([<matplotlib.axes._subplots.AxesSubplot object at blablabla>,
<matplotlib.axes._subplots.AxesSubplot object at blablabla>,
<matplotlib.axes._subplots.AxesSubplot object at blablabla>])
I want to grab this output to edit the title, x label and save it as pdf. If it was only one axis I would first grab the output of the .plot in a variable, say ax and then set the title and get the figure with fig = ax.get_figure() to save it the way I want. How can I do the same here?

Let's use ax = infront of df.plot to get a list of axes. Then you can use list slicing to access each axes object and set_title, etc.. as below:
df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
df = df.cumsum()
ax = df.plot(subplots=True)
ax[0].set_title('Series A')
ax[1].set_title('Series B')
ax[2].set_title('Series C')
ax[3].set_title('Series D')
fig = ax[0].get_figure()
fig.tight_layout()

Related

Apply `ListedColormap` on bar chart [duplicate]

I have a df with two columns:
y: different numeric values for the y axis
days: the names of four different days (Monday, Tuesday, Wednesday, Thursday)
I also have a colormap with four different colors that I made myself and it's a ListedColorMap object.
I want to create a bar chart with the four categories (days of the week) in the x axis and their corresponding values in the y axis. At the same time, I want each bar to have a different color using my colormap.
This is the code I used to build my bar chart:
def my_barchart(my_df, my_cmap):
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(my_df['days'], my_df['y'], color=my_cmap)
return fig
However, I get the following error: "object of type 'ListedColormap' has no len()", so it seems that I'm not using my_cmap correctly.
If I remove that from the function and run it, my bar chart looks ok, except that all bars have the same color. So my question is: what is the right way to use a colormap with a bar chart?
The color argument wants either a string or an RGB[A] value (it can be a single colour, or a sequence of colours with one for each data point you are plotting). Colour maps are typically callable with floats in the range [0, 1].
So what you want to do is take the values you want for the colours for each bar, scale them to the range [0, 1], and then call my_cmap with those rescaled values.
So, say for example you wanted the colours to correspond to the y values (heights of the bars), then you should modify your code like this (assumes you have called import numpy as np earlier on):
def my_barchart(my_df, my_cmap):
rescale = lambda y: (y - np.min(y)) / (np.max(y) - np.min(y))
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(my_df['days'], my_df['y'], color=my_cmap(rescale(my_df['y'])))
return fig
Here is a self-contained minimal example of using the color argument with the output from a cmap:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
my_cmap = plt.get_cmap("viridis")
rescale = lambda y: (y - np.min(y)) / (np.max(y) - np.min(y))
plt.bar(x, y, color=my_cmap(rescale(y)))
plt.savefig("temp")
Output:
Okay, I found a way to do this without having to scale my values:
def my_barchart(my_df, my_cmap):
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.bar(my_df['days'], my_df['y'], color=my_cmap.colors)
return fig
Simply adding .colors after my_cmap works!

how to increase space between bar and increase bar width in matplotlib

i am web-scraping a wikipedia table directly from wikipedia website and plot the table. i want to increase the bar width, add space between the bars and make all bars visible. pls how can i do? my code below
#########scrapping#########
html= requests.get("https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria")
bsObj= BeautifulSoup(html.content, 'html.parser')
states= []
cases=[]
for items in bsObj.find("table",{"class":"wikitable sortable"}).find_all('tr')[1:37]:
data = items.find_all(['th',{"align":"left"},'td'])
states.append(data[0].a.text)
cases.append(data[1].b.text)
########Dataframe#########
table= ["STATES","CASES"]
tab= pd.DataFrame(list(zip(states,cases)),columns=table)
tab["CASES"]=tab["CASES"].replace('\n','', regex=True)
tab["CASES"]=tab["CASES"].replace(',','', regex=True)
tab['CASES'] = pd.to_numeric(tab['CASES'], errors='coerce')
tab["CASES"]=tab["CASES"].fillna(0)
tab["CASES"] = tab["CASES"].values.astype(int)
#######matplotlib########
x=tab["STATES"]
y=tab["CASES"]
plt.cla()
plt.locator_params(axis='y', nbins=len(y)/4)
plt.bar(x,y, color="blue")
plt.xticks(fontsize= 8,rotation='vertical')
plt.yticks(fontsize= 8)
plt.show()
Use pandas.read_html and barh
.read_html will read all tables tags from a website and return a list of dataframes.
barh will make horizontal instead of vertical bars, which is useful if there are a lot of bars.
Make the plot longer, if needed. In this case, (16.0, 10.0), increase 10.
I'd recommend using a log scale for x, because Lagos has so many cases compared to Kogi
This doesn't put more space between the bars, but the formatted plot is more legible with its increased dimensions and horizontal bars.
.iloc[:36, :5] removes some unneeded columns and rows from the dataframe.
import pandas as pd
import matplotlib.pyplot as plt
# url
url = 'https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria'
# create dataframe list
dataframe_list = pd.read_html(url) # this is a list of all the tables at the url as dataframes
# get the dataframe from the list
df = dataframe_list[2].iloc[:36, :5] # you want the dataframe at index 2
# replace '-' with 0
df.replace('–', 0, inplace=True)
# set to int
for col in df.columns[1:]:
df[col] = df[col].astype('int')
# plot a horizontal bar
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')
p = plt.barh(width='Cases', y='State', data=df, color='purple')
plt.xscale('log')
plt.xlabel('Number of Cases')
plt.show()
Plot all the data in df
df.set_index('State', inplace=True)
plt.figure(figsize=(14, 14))
df.plot.barh()
plt.xscale('log')
plt.show()
4 subplots
State as index
plt.figure(figsize=(14, 14))
for i, col in enumerate(df.columns, 1):
plt.subplot(2, 2, i)
df[col].plot.barh(label=col, color='green')
plt.xscale('log')
plt.legend()
plt.tight_layout()
plt.show()

Draw various plots in one figure

The image below shows, what i want, 3 different plots in one execution but using a function
enter image description here
enter image description here
I used the following code:
def box_hist_plot(data):
sns.set()
ax, fig = plt.subplots(1,3, figsize=(20,5))
sns.boxplot(x=data, linewidth=2.5, ax=fig[0])
plt.hist(x=data, bins=50, density=True, ax = fig[1])
sns.violinplot(x = data, ax=fig[2])
and i got this error:
inner() got multiple values for argument 'ax'
Besides the fact that you should not call a Figure object ax and an array of Axes object fig, your problem comes from the line plt.hist(...,ax=...). plt.hist() should not take an ax= parameter, but is meant to act on the "current" axes. If you want to specify which Axes you want to plot, you should use Axes.hist().
def box_hist_plot(data):
sns.set()
fig, axs = plt.subplots(1,3, figsize=(20,5))
sns.boxplot(x=data, linewidth=2.5, ax=axs[0])
axs[1].hist(x=data, bins=50, density=True)
sns.violinplot(x = data, ax=axs[2])

Dataframe.plot() not working when ax is defined

I am trying to emulate the span selector for the data I have according to the example shown here (https://matplotlib.org/examples/widgets/span_selector.html).
However, my data is in a dataframe & not an array.
When I plot the data by itself with the using the code below
input_month='2017-06'
plt.close('all')
KPI_ue_data.loc[input_month].plot(x='Order_Type', y='#_Days_#_Post_stream')
plt.show()
the data chart is shown perfectly.
However when i am trying to put this into a subplot with the code below (only first two lines are added & ax=ax in the plot line), nothing shows up. I get no error either!!! can anyone help?
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(211, facecolor='#FFFFCC')
input_month='2017-06'
plt.close('all')
KPI_ue_data.loc[input_month].plot(x='Order_Type', y='#_Days_#_Post_stream',ax=ax)
plt.show()
I usually just set x, y from the dataframe and use ax.plot(x, y). For your code, it should look something like this:
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(211, facecolor='#FFFFCC')
input_month='2017-06'
#plt.close('all')
x = KPI_ue_data.loc[(input_month), 'Order_Type']
y = KPI_ue_data.loc[(input_month), '#_Days_#_Post_stream']
ax.plot(x, y)
plt.show()

Secondary y axis limit in pandas plot

Is there a way to set the limit for the secondary Y axis in pandas df.plot
I have the following plotting statement. Is there a way to simply add ylim for secondary axis? as in "secondary_ylim=(0,1)"
df[["Date","Col1","Col2"]].plot(x="date",y=["Col1","Col2"],secondary_y="Col2",ylim = (0,1))
Interesting.... I don't know if there is another way to get the axes for the secondary y_axes.
But, you could do it this way:
df = pd.DataFrame({'Date':pd.date_range('2019-02-01', periods=10), 'Col1':np.random.randint(0,10,10), 'Col2':np.random.randint(100,500, 10)})
ax = df[["Date","Col1","Col2"]].plot(x="Date",y=["Col1","Col2"],secondary_y="Col2", ylim = ([0,5]))
ax.set_ylim(0,5)
fig = ax.get_figure()
ax = fig.get_axes()
ax[1].set_ylim(0,250)
or as #Stef points out, you can use the right_ax
ax = df[["Date","Col1","Col2"]].plot(x="Date",y=["Col1","Col2"],secondary_y="Col2", ylim = ([0,5]))
ax.right_ax.set_ylim(0,250)
Output: