Matplotlib table size and position - matplotlib

I'm trying to make a table with a dimension of Nx7 where N is a variable.
It is quite challenging to make a proper size of table via matplotlib.
I wanna put it on the center of the plot with a title which is located just above the table.
Here's my code
data = [
['A', 'B', 'C', 'D', 'E', 'F'],
['100%', '200%', 'O', 'X', '1.2%', '100', '200'],
['100%', '200%', 'O', 'X', '1.2%', '100', '200'],
['100%', '200%', 'O', 'X', '1.2%', '100', '200'],
['100%', '200%', 'O', 'X', '1.2%', '100', '200'],
['100%', '200%', 'O', 'X', '1.2%', '100', '200']]
fig, ax1 = plt.subplots(dpi=200)
column_headers = data.pop(0)
row_headers = [x.pop(0) for x in data]
rcolors = np.full(len(row_headers), 'linen')
ccolors = np.full(len(column_headers), 'lavender')
cell_text = []
for row in data:
cell_text.append([x for x in row])
table = ax1.table(cellText=cell_text,
cellLoc='center',
rowLabels=row_headers,
rowColours=rcolors,
rowLoc='center',
colColours=ccolors,
colLabels=column_headers,
loc='center')
fig.tight_layout()
table.scale(1, 0.5)
table.set_fontsize(16)
# Hide axes
ax1.get_xaxis().set_visible(False)
ax1.get_yaxis().set_visible(False)
# Add title
ax1.set_title('{}\n({})'.format(title, subtitle), weight='bold', size=14, color='k')
fig.tight_layout()
plt.savefig(filename)
In my code, there are several problems.
The title is overlapped on the table.
The whole figure is somehow right-side shifted from the center. (Left side of resulting image is filled with empty space)
The size of text in the table is not 16. (much much smaller than the title)
Thank you.

Here is some code to draw the table.
Some remarks:
The support for table in matplotlib is rather elementary. It is mostly meant to add some text in table form on an existing plot.
Using .pop() makes the code difficult to reason about. In Python, usually new lists are created starting from the given lists.
As the adequate size of the plot highly depends on the number of rows, a possibility is to calculate it as some multiple. The exact values depend on the complete table, it makes sense to experiment a bit. The values below seem to work well for the given example data.
dpi and tight bounding box can be set as parameters to savefig().
Different matplotlib versions might behave slightly different. The code below is tested with matplotlib 3.3.3.
import numpy as np
import matplotlib.pyplot as plt
N = 10
data = [['A', 'B', 'C', 'D', 'E', 'F']] + [['100%', '200%', 'O', 'X', '1.2%', '100', '200']] * N
column_headers = data[0]
row_headers = [row[0] for row in data[1:]]
cell_text = [row[1:] for row in data[1:]]
fig, ax1 = plt.subplots(figsize=(10, 2 + N / 2.5))
rcolors = np.full(len(row_headers), 'linen')
ccolors = np.full(len(column_headers), 'lavender')
table = ax1.table(cellText=cell_text,
cellLoc='center',
rowLabels=row_headers,
rowColours=rcolors,
rowLoc='center',
colColours=ccolors,
colLabels=column_headers,
loc='center')
table.scale(1, 2)
table.set_fontsize(16)
ax1.axis('off')
title = "demo title"
subtitle = "demo subtitle"
ax1.set_title(f'{title}\n({subtitle})', weight='bold', size=14, color='k')
plt.savefig("demo_table.png", dpi=200, bbox_inches='tight')

Related

Formatting Currency in Matplotlib

ax.bar_label(rects2, padding=3, fmt='$%.2f')
Able to get the '$' sign and two decimal places, but can't seem to add the separator.
Tried:
labels=[f'${x:,.2f}']
The code below is illustrating how to format numbers as currency labels with thousands separator with 2 decimals:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
langs = ['A', 'B', 'C', 'D', 'E']
students = [2300,1700,3500,2900,1200]
bars = ax.bar(langs,students)
ax.bar_label(bars, labels=[f'${x:,.2f}' for x in bars.datavalues])
plt.show()
Output:

How make four subplots into one figure with four different protein sequences?

How do I make four subplots into one figure and save it to my desktop? I'm having trouble with the input prompt where you can insert 4 different are protein sequences
import numpy as np
from matplotlib import pyplot as plt
protein_input = input('Protein Sequence: ')
protein_nospace = protein_input.strip()
# plot protein frequency and print graph
x_values = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']
counts = defaultdict(int)
for aa in protein_nospace:
if aa in x_values:
counts[aa] += 1
else:
counts[aa] = 1
y_values = np.array([v for v in counts.values()])
plt.figure()
plt.bar(x_values, y_values)
plt.title('Amino acid Frequencies')
plt.xlabel('Amino Acids')
plt.ylabel('Frequency')
plt.show()
You could create a subplot for each of the proteins. Matplotlib object-oriented interface helps to write everything into its subplot. plt.savefig saves the plot to a file.
In general, it is not a good idea to read this type of input with an interactive session. That way, you can't easily check for errors, nor can you easily reference the input later to verify how the plot was created.
For short scripts and short inputs, the easiest is to just copy-paste your input into the code. And then save the code for later reference. For longer inputs, or to have the same graph with other input sets, you can save only the strings in separate files.
The code below now shows how the input can be read interactively. (In the comment there is a version with "hard-coded" inputs.)
from matplotlib import pyplot as plt
from collections import defaultdict
protein_input_list = []
while True:
protein_input = input('Enter next Protein Sequences (empty input to stop):')
protein_nospace = protein_input.strip()
if len(protein_nospace) == 0:
break
else:
protein_input_list.append(protein_nospace)
'''
protein_input_list = ['ABABDEADWDSWEFAECD',
'DEFSDMDSLHEWVOIHWEAAEHRG',
'HIWEORMLSDAWEEFWEFWEEWJK',
'JLSSFSFLIWIJOWHOE']
'''
fig, axes = plt.subplots(ncols=len(protein_input_list), figsize=(15, 5))
for index, ( ax, protein_nospace) in enumerate( zip(axes.ravel(), protein_input_list)):
x_values = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']
counts = defaultdict(int)
for aa in protein_nospace:
if aa in x_values:
counts[aa] += 1
else:
counts[aa] = 1
ax.bar(counts.keys(), counts.values())
ax.set_title(f'Amino acid Frequencies {index+1}')
ax.set_xlabel('Amino Acids')
ax.set_ylabel('Frequency')
plt.savefig('Amino acid Frequencies.png')
plt.show()

How to add markers on legend and graph - matplotlib

I have the following code:
from matplotlib import pyplot as plt
import seaborn as sns
fig = plt.figure()
fig.suptitle('Average GPA and Standard Deviation per course combination', fontsize=15)
plt.xlabel('Standard Deviation of Average GPA', fontsize=12)
plt.ylabel('Average GPA', fontsize=12)
colors = ['#E74C3C', '#76448A', '#3498DB', '#17A589', '#F1C40F', '#F39C12', '#CA6F1E', '#B3B6B7', '#34495E',
'#F5B7B1']
marker = ['.','v','^','1','2','8','p','P','x','X']
g = sns.scatterplot(x=all_stdev, y=all_gpas, hue=final_courses)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.show()
the all_stdev, all_gpas, final_courses are lists and change everytime based on the user, since these are recommendations for the user based on input. The result I get is the following for a particular student:
I tried putting markers in order for them to be more easy to understand from the user(the results) but no matter the things I tried I did not manage to do it. The markers would appear in the graph with same color and the legend would still have all colors as shown above. I need to add the markers to the graph and legend as well. Is there a way to do it? I have a list with markers that I would like to use in the code provided.
You need to add the markers and the colors as a parameter to the scatterplot.
There still is another problem with the markers. Seaborn complains: Filled and line art markers cannot be mixed. So you need to select either filled or line art markers.
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
fig = plt.figure()
fig.suptitle('Average GPA and Standard Deviation per course combination', fontsize=15)
plt.xlabel('Standard Deviation of Average GPA', fontsize=12)
plt.ylabel('Average GPA', fontsize=12)
colors = ['#E74C3C', '#76448A', '#3498DB', '#17A589', '#F1C40F', '#F39C12', '#CA6F1E', '#B3B6B7', '#34495E', '#F5B7B1']
# marker = ['.', 'v', '^', '1', '2', '8', 'p', 'P', 'x', 'X']
marker = ['o', 'v', '^', '8', '*', 'P', 'D', 'X', 's', 'p']
N = 30
final_courses = np.random.randint(1,11, N) * 10
all_stdev = np.random.uniform(0, 2, N)
all_gpas = np.random.uniform(3, 4, N)
g = sns.scatterplot(x=all_stdev, y=all_gpas, hue=final_courses, style=final_courses, palette=colors, markers=marker)
g.legend(loc='center left', bbox_to_anchor=(1, 0.5), ncol=1)
plt.show()

Last bin of histogram missing

First, this answer does not work for me, but the problem is essentially the same. My data x is a list in the range [-2:18], labeled as [A:U]. The last bin (17 or T) is actually accumulating the number of values of 17-T and 18-U, showing bin 18-U empty.
My code looks like this (aesthetics have been omitted, x was read from a .csv):
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=figsize)
Labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U']
bins = len(Labels)
ax.hist(x, bins=bins, density=False, histtype='step', color='grey', linewidth=2)
ax.set_xticklabels(Labels)
plt.show()
The result is this:
Trying the existing solution, bins = len(Labels) + 1 does not make any difference.

Obtaining the exact data coordinates of seaborn boxplot boxes

I have a seaborn boxplot (sns.boxplot) on which I would like to add some points. For example, say I have this pandas DataFrame:
[In] import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'Property 1':['a']*100+['b']*100,
'Property 2': ['w', 'x', 'y', 'z']*50,
'Value': np.random.normal(size=200)})
df.head(3)
[Out] Property 1 Property 2 Value
0 a w 1.421380
1 a x -1.034465
2 a y 0.212911
[In] df.shape
[Out] (200, 3)
I can easily generate a boxplot with seaborn:
[In] sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
[Out]
Now say I want to add markers for a specific case in my sample. I can get close with this:
[In] specific_case = pd.DataFrame([['a', 'w', '0.5'],
['a', 'x', '0.2'],
['a', 'y', '0.1'],
['a', 'z', '0.3'],
['b', 'w', '-0.5'],
['b', 'x', '-0.2'],
['b', 'y', '0.3'],
['b', 'z', '0.5']
],
columns = df.columns
)
[In] sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
plt.plot(np.arange(-0.25, 3.75, 0.5),
specific_case['Value'].values, 'ro')
[Out]
That is unsatisfactory, of course.
I then used this answer that talks about getting the bBox and this tutorial about converting diplay coordinates into data coordinates to write this function:
[In] def get_x_coordinates_of_seaborn_boxplot(ax, x_or_y):
display_coordinates = []
inv = ax.transData.inverted()
for c in ax.get_children():
if type(c) == mpl.patches.PathPatch:
if x_or_y == 'x':
display_coordinates.append(
(c.get_extents().xmin+c.get_extents().xmax)/2)
if x_or_y == 'y':
display_coordinates.append(
(c.get_extents().ymin+c.get_extents().ymax)/2)
return inv.transform(tuple(display_coordinates))
That works great for my first hue, but not at all for my second:
[In] ax = sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
coords = get_x_coordinates_of_seaborn_boxplot(ax, 'x')
plt.plot(coords, specific_case['Value'].values, 'ro')
[Out]
How can I get the data coordinates of all my boxes?
I'm unsure about the purpose of those transformations. But it seems the real problem is just to plot the points from the specific_case at the correct positions. The xcoordinate of every boxplot is shifted by 0.2 from the whole number. (That is because bars are 0.8 wide by default, you have 2 boxes, which makes each 0.4 wide, half of that is 0.2.)
You then need to arrange the x values to fit to those of the specific_case dataframe.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'Property 1':['a']*100+['b']*100,
'Property 2': ['w', 'x', 'y', 'z']*50,
'Value': np.random.normal(size=200)})
specific_case = pd.DataFrame([['a', 'w', '0.5'],
['a', 'x', '0.2'],
['a', 'y', '0.1'],
['a', 'z', '0.3'],
['b', 'w', '-0.5'],
['b', 'x', '-0.2'],
['b', 'y', '0.3'],
['b', 'z', '0.5']
], columns = df.columns )
ax = sns.boxplot(x='Property 2', hue='Property 1', y='Value', data=df)
X = np.repeat(np.atleast_2d(np.arange(4)),2, axis=0)+ np.array([[-.2],[.2]])
ax.plot(X.flatten(), specific_case['Value'].values, 'ro', zorder=4)
plt.show()
I got it figured out:
In your code do this to extract the x-coordinate based on hue. I did not do it for the y, but the logic should be the same:
Create two lists holding your x coordinate:
display_coordinates_1=[]
display_coordinates_2=[]
Inside your for loop that starts with:
for c in ax.get_children():
Use the following:
display_coordinates_1.append(c.get_extents().x0)
You need x0 for the x-coordinate of boxplots under first hue.
The following gives you the x-coordinates for the subplots in the second hue. Note the use of x1 here:
display_coordinates_2.append(c.get_extents().x1)
Lastly, after you inv.transform() the two lists, make sure you select every other value, since for x-coordinates each list has 6 outputs and you want the ones at indices 0,2,4 or [::2].
Hope this helps.