How can i remove or change text from a matplotlib plot? - pandas

I have a bar plot which is being returned to me (i have access to the AxesSubplot object) which already has some labels on the bars. The issue is they are illegible and i would like to enlarge them (or clear and reset them). Take the following code for example:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'a':['red','green','blue'], 'b':[4,8,12]})
plot = df.plot(kind='barh')
for i in plot.patches:
plot.text(i.get_width()+.01, i.get_y()+.38, str(i.get_width()), fontsize=31)
This generates a nice bar plot with labels on the bars. But lets say i want to remove or change those labels, how can this be done?

You can access the text objects using plot.texts. In your example, you get:
>>> plot.texts
[Text(4.01,0.13,'4'), Text(8.01,1.13,'8'), Text(12.01,2.13,'12')]
You can remove them all in a loop:
for t in plot.texts:
t.set_visible(False)
Or change attributes (fontsize for example) in a similar manner:
for t in plot.texts:
# Reduce fontsize to 10:
t.set_fontsize(10)

Related

How remove a specific legend label in the Dataframe.plot?

I am trying to plot two DataFrame together by 'bar' style and 'line' style respectively, but have trouble when showing the legend only for the bars, excluding the line.
Here are my codes:
import numpy as np
import pandas as pd
np.random.seed(5)
df = pd.DataFrame({'2012':np.random.random_sample((4,)),'2014':np.random.random_sample((4,))})
df.index = ['A','B','C','D']
sumdf = df.T.apply(np.sum,axis=1)
ax = df.T.plot.bar(stacked=True)
sumdf.plot(ax=ax)
ax.set_xlim([-0.5,1.5])
ax.set_ylim([0,3])
ax.legend(loc='upper center',ncol=3,framealpha=0,labelspacing=0,handlelength=4,borderaxespad=0)
Annoyingly got this: Figure, where the line legend is also shown in the legend box. I want to remove it rather than make it invisible.
But I do not find the way.
Thank you!
If a matplotlib.legend's label starts with an underscore, it will not be shown in the legend by default.
You can simply change
sumdf.plot(ax=ax)
to
sumdf.plot(ax=ax, label='_')

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

Modify an errorbar extent in pandas barplot

I'm plotting data with a pandas barplot that includes errorbars (that are symmetric around the bar top), and I would like to modify the extent of one single errorbar in this plot, so that it shows only on half of it. How can I do that?
Here's a concrete example:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
bars = pd.DataFrame(np.random.randn(2,2), index=['a','b'], columns=['c','d'])
errs = pd.DataFrame(np.random.randn(2,2), index=['a','b'], columns=['c','d'])
ax = bars.plot.barh(color=['r','g'],xerr=errs)
which yields a plot like that:
I'm trying to a posteriori access and modify the extent of the errorbar of index a and column d so that it shows only the first half of it, i.e. a segment [bar_top-err, bar_top] instead of [bar_top-err, bar_top+err]. I attempted to retrieve the following matplotlib object:
plt.getp(ax.get_children()[1],'paths')[0]
which, if I'm not mistaken, is a Bbox, and describes the right object, but I can't get to modify it in my plot. Any idea on how to do that?
You were almost there, just need to modify and update the coordinates in path.vertices. I took the liberty to assume that you want the error bar to face "away from zero", instead of just showing the negative part of it:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
bars = pd.DataFrame(np.random.randn(2,2), index=['a','b'], columns=['c','d'])
errs = pd.DataFrame(np.random.randn(2,2), index=['a','b'], columns=['c','d'])
ax = bars.plot.barh(color=['r','g'], xerr=errs)
child = ax.get_children()[1]
path = plt.getp(child, 'paths')[0]
bar_top = path.vertices.mean(axis=0)[0]
# replace the right tail if bar is negative or left tail if it's positive
method = np.argmin if np.sign(bar_top)==1 else np.argmax
idx = method(path.vertices, axis=0)[0]
path.vertices[idx, 0] = bar_top
plt.savefig('figs/hack-linecollections.png', dpi=150)
plt.show()

Creating a bar plot using Seaborn

I am trying to plot bar chart using seaborn. Sample data:
x=[1,1000,1001]
y=[200,300,400]
cat=['first','second','third']
df = pd.DataFrame(dict(x=x, y=y,cat=cat))
When I use:
sns.factorplot("x","y", data=df,kind="bar",palette="Blues",size=6,aspect=2,legend_out=False);
The figure produced is
When I add the legend
sns.factorplot("x","y", data=df,hue="cat",kind="bar",palette="Blues",size=6,aspect=2,legend_out=False);
The resulting figure looks like this
As you can see, the bar is shifted from the value. I don't know how to get the same layout as I had in the first figure and add the legend.
I am not necessarily tied to seaborn, I like the color palette, but any other approach is fine with me. The only requirement is that the figure looks like the first one and has the legend.
It looks like this issue arises here - from the docs searborn.factorplot
hue : string, optional
Variable name in data for splitting the plot by color. In the case of ``kind=”bar”, this also influences the placement on the x axis.
So, since seaborn uses matplotlib, you can do it like this:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x=[1,1000,1001]
y=[200,300,400]
sns.set_context(rc={"figure.figsize": (8, 4)})
nd = np.arange(3)
width=0.8
plt.xticks(nd+width/2., ('1','1000','1001'))
plt.xlim(-0.15,3)
fig = plt.bar(nd, y, color=sns.color_palette("Blues",3))
plt.legend(fig, ['First','Second','Third'], loc = "upper left", title = "cat")
plt.show()
Added #mwaskom's method to get the three sns colors.

Colors for pandas timeline graphs with many series

I am using pandas for graphing data for a cluster of nodes. I find that pandas is repeating color values for the different series, which makes them indistinguishable.
I tried giving custom color values like this and passed the my_colors to the colors field in plot:
my_colors = []
for node in nodes_list:
my_colors.append(rand_color())
rand_color() is defined as follows:
def rand_color():
from random import randrange
return "#%s" % "".join([hex(randrange(16, 255))[2:] for i in range(3)])
But here also I need to avoid color values that are too close to distinguish. I sometimes have as many as 60 nodes (series). Most probably a hard-coded list of color values would be best option?
You can get a list of colors from any colormap defined in Matplotlib, and even custom colormaps, by:
>>> import matplotlib.pyplot as plt
>>> colors = plt.cm.Paired(np.linspace(0,1,60))
Plotting an example with these colors:
>>> plt.scatter( range(60), [0]*60, color=colors )
<matplotlib.collections.PathCollection object at 0x04ED2830>
>>> plt.axis("off")
(-10.0, 70.0, -0.0015, 0.0015)
>>> plt.show()
I found the "Paired" colormap to be especially useful for this kind of things, but you can use any other available or custom colormap.