Formatting Currency in Matplotlib - matplotlib

ax.bar_label(rects2, padding=3, fmt='$%.2f')
Able to get the '$' sign and two decimal places, but can't seem to add the separator.
Tried:
labels=[f'${x:,.2f}']

The code below is illustrating how to format numbers as currency labels with thousands separator with 2 decimals:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
langs = ['A', 'B', 'C', 'D', 'E']
students = [2300,1700,3500,2900,1200]
bars = ax.bar(langs,students)
ax.bar_label(bars, labels=[f'${x:,.2f}' for x in bars.datavalues])
plt.show()
Output:

Related

matplotlib stacked bar chart with zero centerd

I have a dataset like below.
T/F
Value
category
T
1
A
F
3
B
T
5
C
F
7
A
T
8
B
...
...
...
so, I want to draw a bar chart like below. same categoy has same position
same category has same position, zero centered bar and number of F is bar below the horizontal line, T is upper bar.
How can I make this chart with matplotlib.pyplot? or other library
I need example.
One approach involves making the False values negative, and then creating a Seaborn barplot with T/F as hue. You might want to make a copy of the data if you can't change the original.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
data = pd.DataFrame({'T/F': ['T', 'F', 'T', 'F', 'T'],
'Value': [1, 3, 5, 7, 8],
'category': ['A', 'B', 'C', 'A', 'B']})
data['Value'] = np.where(data['T/F'] == 'T', data['Value'], -data['Value'])
ax = sns.barplot(data=data, x='category', y='Value', hue='T/F', dodge=False, palette='turbo')
ax.axhline(0, lw=2, color='black')
plt.tight_layout()
plt.show()

How to add markers on legend and graph - matplotlib

I have the following code:
from matplotlib import pyplot as plt
import seaborn as sns
fig = plt.figure()
fig.suptitle('Average GPA and Standard Deviation per course combination', fontsize=15)
plt.xlabel('Standard Deviation of Average GPA', fontsize=12)
plt.ylabel('Average GPA', fontsize=12)
colors = ['#E74C3C', '#76448A', '#3498DB', '#17A589', '#F1C40F', '#F39C12', '#CA6F1E', '#B3B6B7', '#34495E',
'#F5B7B1']
marker = ['.','v','^','1','2','8','p','P','x','X']
g = sns.scatterplot(x=all_stdev, y=all_gpas, hue=final_courses)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.show()
the all_stdev, all_gpas, final_courses are lists and change everytime based on the user, since these are recommendations for the user based on input. The result I get is the following for a particular student:
I tried putting markers in order for them to be more easy to understand from the user(the results) but no matter the things I tried I did not manage to do it. The markers would appear in the graph with same color and the legend would still have all colors as shown above. I need to add the markers to the graph and legend as well. Is there a way to do it? I have a list with markers that I would like to use in the code provided.
You need to add the markers and the colors as a parameter to the scatterplot.
There still is another problem with the markers. Seaborn complains: Filled and line art markers cannot be mixed. So you need to select either filled or line art markers.
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
fig = plt.figure()
fig.suptitle('Average GPA and Standard Deviation per course combination', fontsize=15)
plt.xlabel('Standard Deviation of Average GPA', fontsize=12)
plt.ylabel('Average GPA', fontsize=12)
colors = ['#E74C3C', '#76448A', '#3498DB', '#17A589', '#F1C40F', '#F39C12', '#CA6F1E', '#B3B6B7', '#34495E', '#F5B7B1']
# marker = ['.', 'v', '^', '1', '2', '8', 'p', 'P', 'x', 'X']
marker = ['o', 'v', '^', '8', '*', 'P', 'D', 'X', 's', 'p']
N = 30
final_courses = np.random.randint(1,11, N) * 10
all_stdev = np.random.uniform(0, 2, N)
all_gpas = np.random.uniform(3, 4, N)
g = sns.scatterplot(x=all_stdev, y=all_gpas, hue=final_courses, style=final_courses, palette=colors, markers=marker)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.show()

Not able to create a 3x3 grid of subplots to visualize 9 Series individually

I want to have a 3x3 grid of subplots to visualize each Series individually.
I first created some toy data:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style='whitegrid', rc={"figure.figsize":(14,6)})
rs = np.random.RandomState(444)
dates = pd.date_range(start="2009-01-01", end='2019-12-31', freq='1D')
values = rs.randn(4017,12).cumsum(axis=0)
data = pd.DataFrame(values, dates, columns =['a','b','c','d','e','f','h','i','j','k','l','m'])
Here is the first code I wrote:
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True)
for col in n_cols:
ax = data[col].plot()
With these lines of code the problem is that I get the 3x3 grid but all the columns have been plotten on the same subplotsAxes, in the bottom right corner.
Bottom Right Corner with all Lines
Here is the second thing I tried:
n_cols = ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'i', 'j']
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True)
for col in n_cols:
for i in range(3):
for j in range(3):
ax[i,j].plot(data[col])
But now I get all the columns plotted on every single subplotAxes.
All AxesSubplot with same lines
And if I try something like this:
fig, ax = plt.subplots(sharex=True, sharey=True)
for col in n_cols:
for i in range(3):
for j in range(3):
ax[i,j].add_subplot(data[col])
But I get:
TypeError: 'AxesSubplot' object is not subscriptable
I am sorry but can't figure out what to do.
Currently you're plotting each series in each of the subplots:
for col in n_cols:
for i in range(3):
for j in range(3):
ax[i,j].plot(data[col])
Following your example code, here is a way to only plot a single series per subplot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
rs = np.random.RandomState(444)
dates = pd.date_range(start="2009-01-01", end='2019-12-31', freq='1D')
values = rs.randn(4017,12).cumsum(axis=0)
data = pd.DataFrame(values, dates, columns =['a','b','c','d','e','f','h','i','j','k','l','m'])
n_cols = ['a', 'b', 'c', 'd', 'e', 'f', 'h', 'i', 'j']
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True)
for i in range(3):
for j in range(3):
col_name = n_cols[i*3+j]
ax[i,j].plot(data[col_name])
plt.show()

Obtaining the exact data coordinates of seaborn boxplot boxes

I have a seaborn boxplot (sns.boxplot) on which I would like to add some points. For example, say I have this pandas DataFrame:
[In] import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'Property 1':['a']*100+['b']*100,
'Property 2': ['w', 'x', 'y', 'z']*50,
'Value': np.random.normal(size=200)})
df.head(3)
[Out] Property 1 Property 2 Value
0 a w 1.421380
1 a x -1.034465
2 a y 0.212911
[In] df.shape
[Out] (200, 3)
I can easily generate a boxplot with seaborn:
[In] sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
[Out]
Now say I want to add markers for a specific case in my sample. I can get close with this:
[In] specific_case = pd.DataFrame([['a', 'w', '0.5'],
['a', 'x', '0.2'],
['a', 'y', '0.1'],
['a', 'z', '0.3'],
['b', 'w', '-0.5'],
['b', 'x', '-0.2'],
['b', 'y', '0.3'],
['b', 'z', '0.5']
],
columns = df.columns
)
[In] sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
plt.plot(np.arange(-0.25, 3.75, 0.5),
specific_case['Value'].values, 'ro')
[Out]
That is unsatisfactory, of course.
I then used this answer that talks about getting the bBox and this tutorial about converting diplay coordinates into data coordinates to write this function:
[In] def get_x_coordinates_of_seaborn_boxplot(ax, x_or_y):
display_coordinates = []
inv = ax.transData.inverted()
for c in ax.get_children():
if type(c) == mpl.patches.PathPatch:
if x_or_y == 'x':
display_coordinates.append(
(c.get_extents().xmin+c.get_extents().xmax)/2)
if x_or_y == 'y':
display_coordinates.append(
(c.get_extents().ymin+c.get_extents().ymax)/2)
return inv.transform(tuple(display_coordinates))
That works great for my first hue, but not at all for my second:
[In] ax = sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
coords = get_x_coordinates_of_seaborn_boxplot(ax, 'x')
plt.plot(coords, specific_case['Value'].values, 'ro')
[Out]
How can I get the data coordinates of all my boxes?
I'm unsure about the purpose of those transformations. But it seems the real problem is just to plot the points from the specific_case at the correct positions. The xcoordinate of every boxplot is shifted by 0.2 from the whole number. (That is because bars are 0.8 wide by default, you have 2 boxes, which makes each 0.4 wide, half of that is 0.2.)
You then need to arrange the x values to fit to those of the specific_case dataframe.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'Property 1':['a']*100+['b']*100,
'Property 2': ['w', 'x', 'y', 'z']*50,
'Value': np.random.normal(size=200)})
specific_case = pd.DataFrame([['a', 'w', '0.5'],
['a', 'x', '0.2'],
['a', 'y', '0.1'],
['a', 'z', '0.3'],
['b', 'w', '-0.5'],
['b', 'x', '-0.2'],
['b', 'y', '0.3'],
['b', 'z', '0.5']
], columns = df.columns )
ax = sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
X = np.repeat(np.atleast_2d(np.arange(4)),2, axis=0)+ np.array([[-.2],[.2]])
ax.plot(X.flatten(), specific_case['Value'].values, 'ro', zorder=4)
plt.show()
I got it figured out:
In your code do this to extract the x-coordinate based on hue. I did not do it for the y, but the logic should be the same:
Create two lists holding your x coordinate:
display_coordinates_1=[]
display_coordinates_2=[]
Inside your for loop that starts with:
for c in ax.get_children():
Use the following:
display_coordinates_1.append(c.get_extents().x0)
You need x0 for the x-coordinate of boxplots under first hue.
The following gives you the x-coordinates for the subplots in the second hue. Note the use of x1 here:
display_coordinates_2.append(c.get_extents().x1)
Lastly, after you inv.transform() the two lists, make sure you select every other value, since for x-coordinates each list has 6 outputs and you want the ones at indices 0,2,4 or [::2].
Hope this helps.

Pandas bar plot -- specify bar color by column

Is there a simply way to specify bar colors by column name using Pandas DataFrame.plot(kind='bar') method?
I have a script that generates multiple DataFrames from several different data files in a directory. For example it does something like this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pds
data_files = ['a', 'b', 'c', 'd']
df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])
df1.plot(kind='bar', ax=plt.subplot(121))
df2.plot(kind='bar', ax=plt.subplot(122))
plt.show()
With the following output:
Unfortunately, the column colors aren't consistent for each label in the different plots. Is it possible to pass in a dictionary of (filenames:colors), so that any particular column always has the same color. For example, I could imagine creating this by zipping up the filenames with the Matplotlib color_cycle:
data_files = ['a', 'b', 'c', 'd']
colors = plt.rcParams['axes.color_cycle']
print zip(data_files, colors)
[('a', u'b'), ('b', u'g'), ('c', u'r'), ('d', u'c')]
I could figure out how to do this directly with Matplotlib: I just thought there might be a simpler, built-in solution.
Edit:
Below is a partial solution that works in pure Matplotlib. However, I'm using this in an IPython notebook that will be distributed to non-programmer colleagues, and I'd like to minimize the amount of excessive plotting code.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pds
data_files = ['a', 'b', 'c', 'd']
mpl_colors = plt.rcParams['axes.color_cycle']
colors = dict(zip(data_files, mpl_colors))
def bar_plotter(df, colors, sub):
ncols = df.shape[1]
width = 1./(ncols+2.)
starts = df.index.values - width*ncols/2.
plt.subplot(120+sub)
for n, col in enumerate(df):
plt.bar(starts + width*n, df[col].values, color=colors[col],
width=width, label=col)
plt.xticks(df.index.values)
plt.grid()
plt.legend()
df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])
bar_plotter(df1, colors, 1)
bar_plotter(df2, colors, 2)
plt.show()
You can pass a list as the colors. This will require a little bit of manual work to get it to line up, unlike if you could pass a dictionary, but may be a less cluttered way to accomplish your goal.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pds
data_files = ['a', 'b', 'c', 'd']
df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])
color_list = ['b', 'g', 'r', 'c']
df1.plot(kind='bar', ax=plt.subplot(121), color=color_list)
df2.plot(kind='bar', ax=plt.subplot(122), color=color_list[1:])
plt.show()
EDIT
Ajean came up with a simple way to return a list of the correct colors from a dictionary:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pds
data_files = ['a', 'b', 'c', 'd']
color_list = ['b', 'g', 'r', 'c']
d2c = dict(zip(data_files, color_list))
df1 = pds.DataFrame(np.random.rand(4,3), columns=data_files[:-1])
df2 = pds.DataFrame(np.random.rand(4,3), columns=data_files[1:])
df1.plot(kind='bar', ax=plt.subplot(121), color=map(d2c.get,df1.columns))
df2.plot(kind='bar', ax=plt.subplot(122), color=map(d2c.get,df2.columns))
plt.show()
Pandas version 1.1.0 makes this easier. You can pass a dictionary to specify different color for each column in the pandas.DataFrame.plot.bar() function:
Here is an example:
df1 = pd.DataFrame({'a': [1.2, .8, .9], 'b': [.2, .9, .7]})
df2 = pd.DataFrame({'b': [0.2, .5, .4], 'c': [.5, .6, .7], 'd': [1.1, .6, .7]})
color_dict = {'a':'green', 'b': 'red', 'c':'blue', 'd': 'cyan'}
df1.plot.bar(color = color_dict)
df2.plot.bar(color = color_dict)