Seaborn countplot -- trouble adding counts to top of bars - matplotlib

I have a survey dataset that I'm trying to develop a number of summary bar plots.
I have successfully created the plots (with hatching -- in case I need to publish in black and white)... but, I'd like to add counts to the top of the bars, but can't seem to get this correct.
Notice that code below (see "bar.annotate..."). The code successfully displays the plot but doesn't display the counts at the top of the bars.
import seaborn as sns
import matplotlib.pyplot al plt
df = survey['Which of the following best describes your position?'] # it's not a dataframe, but if single columns, it's a series
df = df.rename("Fig01: Which of the following best describes your position?")
# Set style
sns.set(style="whitegrid", color_codes=True)
# Make the barplot
#bar = sns.barplot(x=df.index, hue="class", data=df)
bar = sns.countplot(x=df.index, data=df)
bar.set_title("Fig01: Which of the following best describes your position?")
# Define some hatches
#hatches = ['--', '++', 'Xx', '\\', '**', 'oo']
hatches = ['+', 'xx', '\\\\'] # repeating letters increases the density
# Loop over the bars
for i,thisbar in enumerate(bar.patches):
# Set a different hatch for each bar
thisbar.set_hatch(hatches[i])
bar.annotate('%{:d}'.format(thisbar.get_height()), (thisbar.get_x(), thisbar.get_height()+50))
plt.xticks(rotation=-45)
plt.show()
Here's what the output currently looks like (again, I'm trying to get the counts to display at the top of the bars).

Related

Plotly.express Area multiple plots data error

This is my code:
import pandas as pd
import plotly.express as px
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,6],'y3':[3,4,5,6,7]}
df=pd.DataFrame(df)
fig = px.area(df, x="x", y=['y1','y2','y3'])
fig.show()
As you can see my Y data are maximum 7. Why the results in the figure shows wrong values?
Why the results in the figure shows wrong values?
plotly.express.area creates a stacked area plot, where each filled area corresponds to one column of the input data: https://plotly.com/python/filled-area-plots/
For example, at x = 5, the stacked area plot reaches y = 18 = 5 + 6 + 7.
To show each column's data series as-is, without stacking (i.e., starting from the y-coordinate of zero):
fig = px.line(df, x='x', y=['y1', 'y2', 'y3'])
Short answer:
Just add the following to your setup to obtain the desired behavior of px.area:
fig.update_traces(stackgroup = None, fill = 'tozeroy')
The details:
px.area produces a stacked area chart through the trace attribute stackgroup which by default is set to 'one'. To obtain what you're requesting here, you can update this attribute to None with:
fig.update_traces(stackgroup = None)
Plot 1 - Unstacked traces
As you can see, this will display the values in the way you're requesting, but that won't help you that much since the area fill colors are missing. By default, the fill attribute for the traces are set to tonexty. Other options are:
['none', 'tozeroy', 'tozerox', 'tonexty', 'tonextx', 'toself', 'tonext']
And if you set fill to tozeroy with fig.update_traces(fill = 'tozeroy') you'll get what you're looking for:
Plot 2 - Almost there, but what's up with the colors?
This is going to look a bit strange for your particular dataset since the resulting areas will cover each other all the way and because the area colors are opaque. But you can see by the colors of the legend that the color setup is still the same as in your plot. You can verify this by using the following dataset:
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,2],'y3':[3,4,5,6,1]}
Plot 3 - Bingo!
Complete code:
import pandas as pd
import plotly.express as px
# df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,6],'y3':[3,4,5,6,7]}
df={'x':[1,2,3,4,5],'y1':[1,2,3,4,5],'y2':[2,3,4,5,2],'y3':[3,4,5,6,1]}
df=pd.DataFrame(df)
fig = px.area(df, x="x", y=['y1','y2','y3'])
#fig.update_traces(stackgroup = None)
#fig.update_traces(fill = 'tozeroy')
fig.update_traces(stackgroup = None, fill = 'tozeroy')
fig.show()

How to draw a grid in a bar-plot created with plt.vlines()

I want to create a bar-plot in python. I want this plot to be beautiful though and I don't like the looks of python's axes.bar() function. Therefore, I have decided to use plt.vlines(). The challenge here is that my x-data is a list that contains strings and not numerical data. When I plot my graph, the spacing between the two columns (in my example column 2 = 0) is pretty big:
Furthermore, I want a grid. However, I would like to have minor grid lines as well. I know how to get all of this if my data was numerical. But since my x-data contains strings, I don't know how to set x_max. Any suggestions?
Internally, the positions of the labels are numbered 0,1,... So setting the x-limits a bit before 0 and after the last, shows them more centered.
Usually, bars are drawn with their 'feet' on the ground, which can be set via plt.ylim(0, ...). Minor ticks can be positioned for example at multiples of 0.2. Setting the length of the ticks to zero lets the position count for the grid, but suppresses the tick mark.
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator
import numpy as np
labels = ['Test 1', 'Test 2']
values = [1, 0.7]
fig, ax = plt.subplots()
plt.vlines(labels, 0, values, colors='dodgerblue', alpha=.4, lw=7)
plt.xlim(-0.5, len(labels) - 0.5) # add some padding left and right of the bars
plt.ylim(0, 1.1) # bars usually have their 0 at the bottom
ax.xaxis.set_minor_locator(MultipleLocator(.2))
plt.tick_params(axis='x', which='both', length=0) # ticks not shown, but position serves for gridlines
plt.grid(axis='both', which='both', ls=':') # optionally set the linestyle of the grid
plt.show()

Overlay two seaborn barplots of different size

Say there are two datasets: a big "background" set, and much smaller "foreground" set. The foreground set comes from the background, but might be much smaller.
I am interested in showing the entire background distribution in an ordered sns.barplot, and have the foreground set a brighter contrasting color to draw attention to these samples.
The best solution I could find is to display one graph on top of the other, but what happens is the graph shrinks down to the smaller domain. Here's what I mean:
import matplotlib.pyplot as plt
import seaborn
# Load the example car crash dataset
crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False)
# states of interest
txcahi = crashes[crashes['abbrev'].isin(['TX','CA','HI'])]
# Plot the total crashes
f, ax = plt.subplots(figsize=(10, 5))
plt.xticks(rotation=90, fontsize=10)
sns.barplot(y="total", x="abbrev", data=crashes, label="Total", color="lightgray")
# overlay special states onto gray plot as red bars
sns.barplot(y="total", x="abbrev", data=txcahi, label="Total", color="red")
sns.despine(left=True, bottom=True)
This data produces:
But it should look like this (ignore stylistic differences):
Why doesn't this approach work, and what would be a better way to accomplish this?
A seaborn barplot just plots the its n data along the values of 0 to n-1. If instead you'd use a matplotlib bar plot, which is unit aware (from matplotlib 2.2 onwards), it'll work as expected.
import matplotlib.pyplot as plt
import seaborn as sns
# Load the example car crash dataset
crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False)
# states of interest
txcahi = crashes[crashes['abbrev'].isin(['TX','CA','HI'])]
# Plot the total crashes
f, ax = plt.subplots(figsize=(10, 5))
plt.xticks(rotation=90, fontsize=10)
plt.bar(height="total", x="abbrev", data=crashes, label="Total", color="lightgray")
plt.bar(height="total", x="abbrev", data=txcahi, label="Total", color="red")
sns.despine(left=True, bottom=True)

sns.clustermap ticks are missing

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).