How to Plot a Table of Pandas DataFrame using Matplotlib - pandas

I want to plot a table of Pandas DataFrame using Matplolib tight_layout() in Colab.
First, LaTex was not found while running my code in Colab. I tried this but then I got an ValueError: list.remove(x): x not in list error. I get the same error in Jupyter-Lab too but in the terminal, it works!
How can I make this code work in Colab?
import pandas as pd
import matplotlib.pyplot as plt
# sample data
df = pd.DataFrame()
df['P(S)'] = [0.4, 0.3]
df['P(F)'] = [0.2, 0.1]
fig, ax = plt.subplots()
# hide axes
fig.patch.set_visible(False)
ax.axis('off')
ax.axis('tight')
ax.table(cellText=df.values, colLabels=df.columns, loc='center')
fig.tight_layout()
plt.show()
/usr/local/lib/python3.6/dist-packages/matplotlib/figure.py in get_default_bbox_extra_artists(self)
2234 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2235 # we don't want the figure's patch to influence the bbox calculation
-> 2236 bbox_artists.remove(self.patch)
2237 return bbox_artists
2238
ValueError: list.remove(x): x not in list

Related

How to plot (correctly) lineplot from pandas dataframe?

I'm plotting a lineplot from a pandas dataframe. However the labels are overlapped on the right side of the X axis instead of to the relative point mark on the line. What is missing?
Here the full code and the pic
#importing pandas package
import pandas as pd
import matplotlib.pyplot as plt
import csv
import seaborn as sns
# making data frame from csv file
dataset = pd.read_csv('curve.csv.csv')
df = pd.DataFrame(dataset.sort_values('Split')[['Split', 'Score']])
df.reset_index(drop=True, inplace=True)
print(df)
ax = df.plot.line(x='Split',y='Score',color='green',marker=".")
ax.set_xlim((0, 1))
ax.grid(True)
# set the tick marks for x axis
ax.set_xticks(df.Score)
ax.set_xticklabels(['.005','.010','.015','.020','.040','.060','.080','1','15','20','25','30','35','40','45','50','55','60'
,'65','70','75','80','85','90','95'])
ax.grid(True, linestyle='-.')
ax.tick_params(labelcolor='r', labelsize='medium', width=3)
plt.show()
My desired output would be to have all the labels on the X axis aligned to the relative marker point on the line.
You seem to be using the y-values (df.Score) as the positions of your x-ticks.
I assume you meant
ax.set_xticks(df['Split'])

Creating a grouped bar plot with Seaborn

I am trying to create a grouped bar graph using Seaborn but I am getting a bit lost in the weeds. I actually have it working but it does not feel like an elegant solution. Seaborn only seems to support clustered bar graphs when there is a binary option such as Male/Female. (https://seaborn.pydata.org/examples/grouped_barplot.html)
It does not feel right having to fall back onto matplotlib so much - using the subplots feels a bit dirty :). Is there a way of handling this completely in Seaborn?
Thanks,
Andrew
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
sns.set_theme(style="whitegrid")
rcParams.update({'figure.autolayout': True})
dataframe = pd.read_csv("https://raw.githubusercontent.com/mooperd/uk-towns/master/uk-towns-sample.csv")
dataframe = dataframe.groupby(['nuts_region']).agg({'elevation': ['mean', 'max', 'min'],
'nuts_region': 'size'}).reset_index()
dataframe.columns = list(map('_'.join, dataframe.columns.values))
# We need to melt our dataframe down into a long format.
tidy = dataframe.melt(id_vars='nuts_region_').rename(columns=str.title)
# Create a subplot. A Subplot makes it convenient to create common layouts of subplots.
# https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax1 = plt.subplots(figsize=(6, 6))
# https://stackoverflow.com/questions/40877135/plotting-two-columns-of-dataframe-in-seaborn
g = sns.barplot(x='Nuts_Region_', y='Value', hue='Variable', data=tidy, ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
I'm not sure why you need seaborn. Your data is wide format, so pandas does it pretty well without the need for melting:
from matplotlib import rcParams
sns.set(style="whitegrid")
rcParams.update({'figure.autolayout': True})
fig, ax1 = plt.subplots(figsize=(12,6))
dataframe.plot.bar(x='nuts_region_', ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
Output:

Stacking multiple plots on a 2Y axis

I am trying to plot multiple plots in a 2Y plot.
I have the following code:
Has a list of files to get some data;
Gets the x and y components of data to plot in y-axis 1 and y-axis 2;
Plots data.
When the loop iterates, it plots on different figures. I would like to get all the plots in the same figure.
Can anyone give me some help on this?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
file=[list of paths]
for i in files:
# Loads Data from an excel file
data = pd.read_excel(files[i],sheet_name='Results',dtype=float)
# Gets x and y data from the loaded files
x=data.iloc[:,-3]
y1=data.iloc[:,-2]
y12=data.iloc[:,-1]
y2=data.iloc[:,3]
fig1=plt.figure()
ax1 = fig1.add_subplot(111)
ax1.set_xlabel=('x')
ax1.set_ylabel=('y')
ax1.plot(x,y1)
ax1.semilogy(x,y12)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x,y2)
fig1.tight_layout()
plt.show()
You should instantiate the figure outside the loop, and then add the subplots while iterating. In this way you will have a single figure and all the plots inside it.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
files=[list of paths]
fig1=plt.figure()
for i in files:
# Loads Data from an excel file
data = pd.read_excel(files[i],sheet_name='Results',dtype=float)
# Gets x and y data from the loaded files
x=data.iloc[:,-3]
y1=data.iloc[:,-2]
y12=data.iloc[:,-1]
y2=data.iloc[:,3]
ax1 = fig1.add_subplot(111)
ax1.set_xlabel=('x')
ax1.set_ylabel=('y')
ax1.plot(x,y1)
ax1.semilogy(x,y12)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x,y2)
fig1.tight_layout()
plt.show()

Creating a 1D heat map from a line graph

Is it possible to create a 1D heat map from data in a line graph? i.e. I'd like the highest values in y to represent the warmer colours in a heat map. I've attached an example image of the heat map I'd like it to look like as well as data I currently have in the line graph.
1D heat map and graph example:
To get the heatmap in the image shown I used the following code in python with matplotlib.pyplot:
heatmap, xedges, yedges = np.histogram2d(x, y, bins=(np.linspace(0,length_track,length_track+1),1))
extent = [0, length_track+1, 0, 50]
plt.imshow(heatmap.T, extent=extent, origin='lower', cmap='jet',vmin=0,vmax=None)
But I believe this only works if the data is represented as a scatter plot.
If we assume that the data is equally spaced, one may use an imshow plot to recreate the plot from the question.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
plt.rcParams["figure.figsize"] = 5,2
x = np.linspace(-3,3)
y = np.cumsum(np.random.randn(50))+6
fig, (ax,ax2) = plt.subplots(nrows=2, sharex=True)
extent = [x[0]-(x[1]-x[0])/2., x[-1]+(x[1]-x[0])/2.,0,1]
ax.imshow(y[np.newaxis,:], cmap="plasma", aspect="auto", extent=extent)
ax.set_yticks([])
ax.set_xlim(extent[0], extent[1])
ax2.plot(x,y)
plt.tight_layout()
plt.show()

Overlapping axis label with length distribution

I'm a newbie in python plot, I want to plot the lists with this code:
import numpy as np
import matplotlib.pyplot as plt
alphab = [172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465]
frequencies = [24,17,21,27,10,21,26,41,23,27,25,22,21,24,31,24,19,18,27,15,29,28,22,35,35,28,30,20,29,42,39,35,30,29,38,32,35,47,30,44,55,34,41,41,46,56,39,39,57,39,58,44,51,52,51,44,57,48,50,59,54,46,64,63,56,60,74,72,75,72,60,75,74,55,75,69,70,69,73,69,63,80,70,74,62,77,69,78,70,68,68,80,71,77,79,64,83,76,64,92,77,93,86,65,88,86,79,91,79,97,87,67,83,96,94,79,102,114,89,92,90,112,100,107,98,95,99,95,96,91,103,111,85,105,113,103,105,95,110,103,111,102,102,117,127,128,110,100,122,99,126,99,113,114,133,129,118,120,105,121,112,115,118,127,109,116,96,101,98,98,94,114,94,87,83,117,87,105,120,116,96,112,92,106,115,107,98,107,87,86,111,108,113,106,109,102,89,81,102,87,124,127,116,106,98,106,117,95,113,107,121,92,102,97,94,94,122,110,101,118,112,106,95,112,115,102,136,114,125,136,126,120,116,119,140,114,125,148,126,137,140,129,134,124,141,126,127,124,162,124,137,136,137,142,156,131,153,150,139,131,143,119,145,142,135,151,117,143,151,146,149,125,109,124,135,144,125,127,161,120,158,112,129,125,134,131,130,122,118,145,132,123,131,129]
pos = np.arange(len(alphab))
plt.bar(pos, frequencies)
plt.xticks(pos, alphab, rotation=90)
plt.show()
but I get the following:
how could I get this?
The lists, are length distribution, e.g, 172 appears 24 times,..., 465 appear 129 times.
Thanks for your help.
option 1
Let plt figure it out
plt.bar(pos, frequencies)
# plt.xticks(pos, alphab, rotation=90)
plt.show()
option 2
mess with the steps
plt.bar(pos, frequencies)
plt.xticks(pos[::50], alphab, rotation=90)
plt.show()