matplotlib two charts side-by-side with third overlying the second chart - pandas

I am trying to use matplotlib (more specifically the plot method from pandas) to plot two charts side-by-side in an ipython notebook with a third chart overlying the second chart and using a secondary y axis. However, I have been unable to get the overlay to work.
Currently this is my code:
import matplotlib.pyplot as plt
%matplotlib inline
fig, axs = plt.subplots(1,2)
fig.set_size_inches(12, 4)
top10.plot(kind='barh', ax=axs[0])
top10_time_trend.T.plot(kind='bar', stacked=True, legend=False, ax=axs[1])
time_trend.plot(kind='line', ax=axs[1], ylim=0, secondary_y=True)
I get the side-by-side structure I am looking for, but only the first (top10) and last (time_trend) plots are visible. My output is below:
When plotted separately the unshown plot (top10_time_trend) looks like this
What I am trying to accomplish is something that looks like this, i.e. the line chart overlaying the stacked bar.

The best method to do this is by creating a third axis say:
ax3 = ax[1].twinx()
and then
top10_time_trend.T.plot(kind='bar', stacked=True, legend=False, ax=ax3)
Please let me know if this works for you.
Here you can find an example for the usage of twinx() from matplotlib docs http://matplotlib.org/examples/api/two_scales.html

Related

re-plot existing figures in a new supblot in matplotlib

Say in the process of visualizing data, I later decided to combine two existing plots which are heavily annotated in a subplot to compare them side by side on the same window.
yes, I could go back and recreate these in a subplot in the first place. But, is there a way I could grab the axes or the figure handles -- I don't understand how it all works---whatever captures all the content of the individual figures and use data that to create the new subplots?
something along the lines of
from numpy.random import seed
from numpy.random import randint
import matplotlib.pyplot as plt
x=list(range(1,11))
seed(1)
y1= randint(5, 35, 10)
seed(2)
y2= randint(5, 35, 10)
seed(3)
y3=randint(5, 35, 10)
fig1=plt.figure(1)
ax1=plt.plot(x,y1,x,y2,x,y3)
plt.xlabel('Xaxis')
plt.ylabel('yaxis')
plt.title('Some Plot')
plt.text(10,10, 'some text')
fig2=plt.figure(2)
ax2= plt.plot(x,y1+y2+y3)
# later, I say decided I wanted to also display the two plots as subplots in the same window.
fig3=pltfigure(3)
plt.subplot(2,1,1)
plt.plot(ax1) # plt.plot(fig1.lines),plt.plot(fig1)
plt.subplot(2,1,2)
plt.plot(ax2)
I'm looking for a simple way to grab all the content already plotted in figures 1 and 2 and passing it directly to the subplots on figure 3.
Each ax1 and ax2 in your code is a list of Line2D. You can extract the lines' data with .get_data() and plot:
# later, I say decided I wanted to also display the two plots as subplots in the same window.
fig3=plt.figure(3)
plt.subplot(2,1,1)
for line in ax1:
plt.plot(*line.get_data())
plt.subplot(2,1,2)
for line in ax1:
plt.plot(*line.get_data())
Output:

Plots getting replaced instead of showing a new plot

I am trying to create multiple plots in my Jupyter notebook. However, when I create one, it replaces the one before it instead of creating a brand new graph. Ex.
#plotting revenue_adj vs vote_average data
df.plot.scatter(x='revenue_adj',y='vote_average',s=.5,title='Average Movie Vote per Budget',figsize=(8,5));
creates a scatter plot, but when I try to plot below it (on a new code line),
df.groupby('genres')['vote_average'].mean().plot()
it replaces the above plot instead of creating a new one under that code. What is going on?
Remember, the plotting functions of pandas use actually matplotlib.
So you can use matplotlib figure() or subplots() functions to create new figures:
import matplotlib.pyplot as plt
fig = plt.figure()
df.plot.scatter()
fig = plt.figure()
df.plot.scatter()
# | or using subplots()
fig, ax = plt.subplots(1,2)
df.plot.scatter(ax=ax[0])
df.plot.scatter(ax=ax[1])

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

Creating a bar plot using Seaborn

I am trying to plot bar chart using seaborn. Sample data:
x=[1,1000,1001]
y=[200,300,400]
cat=['first','second','third']
df = pd.DataFrame(dict(x=x, y=y,cat=cat))
When I use:
sns.factorplot("x","y", data=df,kind="bar",palette="Blues",size=6,aspect=2,legend_out=False);
The figure produced is
When I add the legend
sns.factorplot("x","y", data=df,hue="cat",kind="bar",palette="Blues",size=6,aspect=2,legend_out=False);
The resulting figure looks like this
As you can see, the bar is shifted from the value. I don't know how to get the same layout as I had in the first figure and add the legend.
I am not necessarily tied to seaborn, I like the color palette, but any other approach is fine with me. The only requirement is that the figure looks like the first one and has the legend.
It looks like this issue arises here - from the docs searborn.factorplot
hue : string, optional
Variable name in data for splitting the plot by color. In the case of ``kind=”bar”, this also influences the placement on the x axis.
So, since seaborn uses matplotlib, you can do it like this:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x=[1,1000,1001]
y=[200,300,400]
sns.set_context(rc={"figure.figsize": (8, 4)})
nd = np.arange(3)
width=0.8
plt.xticks(nd+width/2., ('1','1000','1001'))
plt.xlim(-0.15,3)
fig = plt.bar(nd, y, color=sns.color_palette("Blues",3))
plt.legend(fig, ['First','Second','Third'], loc = "upper left", title = "cat")
plt.show()
Added #mwaskom's method to get the three sns colors.

ylabel using function subplots in matplotlib

I recently found the function subplots, which seems to be a more elegant way of setting up multiple subplots than subplot. However, I don't seem to be able to be able to change the properties of the axes for each subplot.
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as npx = np.linspace(0, 20, 100)
fig, axes = plt.subplots(nrows=2)
for i in range(10):
axes[0].plot(x, i * (x - 10)**2)
plt.ylabel('plot 1')
for i in range(10):
axes[1].plot(x, i * np.cos(x))
plt.ylabel('plot 2')
plt.show()
Only the ylabel for the last plot is shown. The same happens for xlabel, xlim and ylim.
I realise that the point of using subplots is to create common layouts of subplots, but if sharex and sharey are set to false, then shouldn't I be able to change some parameters?
One solution would be to use the subplot function instead, but do I need to do this?
Yes you probably want to use the individual subplot instances.
As you've found, plt.ylabel sets the ylabel of the last active plot. To change the parameters of an individual Axes, i.e. subplot, you can use any one of the available methods. To change the ylabel, you can use axes[0].set_ylabel('plot 1').
pyplot, or plt as you've defined it, is a helper module for quickly accessing Axes and Figure methods without needing to store these objects in variables. As the documentation states:
[Pyplot p]rovides a MATLAB-like plotting framework.
You can still use this interface, but you will need to adjust which Axes is the currently active Axes. To do this, pyplot has an axes(h) method, where h is an instance of an Axes. So in you're example, you would call plt.axes(axes[0]) to set the first subplot active, then plt.axes(axes[1]) to set the other.