how to increase space between bar and increase bar width in matplotlib - dataframe

i am web-scraping a wikipedia table directly from wikipedia website and plot the table. i want to increase the bar width, add space between the bars and make all bars visible. pls how can i do? my code below
#########scrapping#########
html= requests.get("https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria")
bsObj= BeautifulSoup(html.content, 'html.parser')
states= []
cases=[]
for items in bsObj.find("table",{"class":"wikitable sortable"}).find_all('tr')[1:37]:
data = items.find_all(['th',{"align":"left"},'td'])
states.append(data[0].a.text)
cases.append(data[1].b.text)
########Dataframe#########
table= ["STATES","CASES"]
tab= pd.DataFrame(list(zip(states,cases)),columns=table)
tab["CASES"]=tab["CASES"].replace('\n','', regex=True)
tab["CASES"]=tab["CASES"].replace(',','', regex=True)
tab['CASES'] = pd.to_numeric(tab['CASES'], errors='coerce')
tab["CASES"]=tab["CASES"].fillna(0)
tab["CASES"] = tab["CASES"].values.astype(int)
#######matplotlib########
x=tab["STATES"]
y=tab["CASES"]
plt.cla()
plt.locator_params(axis='y', nbins=len(y)/4)
plt.bar(x,y, color="blue")
plt.xticks(fontsize= 8,rotation='vertical')
plt.yticks(fontsize= 8)
plt.show()

Use pandas.read_html and barh
.read_html will read all tables tags from a website and return a list of dataframes.
barh will make horizontal instead of vertical bars, which is useful if there are a lot of bars.
Make the plot longer, if needed. In this case, (16.0, 10.0), increase 10.
I'd recommend using a log scale for x, because Lagos has so many cases compared to Kogi
This doesn't put more space between the bars, but the formatted plot is more legible with its increased dimensions and horizontal bars.
.iloc[:36, :5] removes some unneeded columns and rows from the dataframe.
import pandas as pd
import matplotlib.pyplot as plt
# url
url = 'https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria'
# create dataframe list
dataframe_list = pd.read_html(url) # this is a list of all the tables at the url as dataframes
# get the dataframe from the list
df = dataframe_list[2].iloc[:36, :5] # you want the dataframe at index 2
# replace '-' with 0
df.replace('–', 0, inplace=True)
# set to int
for col in df.columns[1:]:
df[col] = df[col].astype('int')
# plot a horizontal bar
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')
p = plt.barh(width='Cases', y='State', data=df, color='purple')
plt.xscale('log')
plt.xlabel('Number of Cases')
plt.show()
Plot all the data in df
df.set_index('State', inplace=True)
plt.figure(figsize=(14, 14))
df.plot.barh()
plt.xscale('log')
plt.show()
4 subplots
State as index
plt.figure(figsize=(14, 14))
for i, col in enumerate(df.columns, 1):
plt.subplot(2, 2, i)
df[col].plot.barh(label=col, color='green')
plt.xscale('log')
plt.legend()
plt.tight_layout()
plt.show()

Related

Pandas bar char labelling? [duplicate]

This question already has answers here:
How to add value labels on a bar chart
(7 answers)
Closed 4 months ago.
I would like to add data labels to factor plots generated by Seaborn. Here is an example:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
titanic_df = pd.read_csv('train.csv')
sns.factorplot('Sex',data=titanic_df,kind='count')
How can I add the 'count' values to the top of each bar on the graph?
You could do it this way:
import math
# Set plotting style
sns.set_style('whitegrid')
# Rounding the integer to the next hundredth value plus an offset of 100
def roundup(x):
return 100 + int(math.ceil(x / 100.0)) * 100
df = pd.read_csv('train.csv')
sns.factorplot('Sex', data=df, kind='count', alpha=0.7, size=4, aspect=1)
# Get current axis on current figure
ax = plt.gca()
# ylim max value to be set
y_max = df['Sex'].value_counts().max()
ax.set_ylim([0, roundup(y_max)])
# Iterate through the list of axes' patches
for p in ax.patches:
ax.text(p.get_x() + p.get_width()/2., p.get_height(), '%d' % int(p.get_height()),
fontsize=12, color='red', ha='center', va='bottom')
plt.show()
You could do something even simpler
plt.figure(figsize=(4, 3))
plot = sns.catplot(x='Sex', y='count', kind='bar', data=titanic_df)
# plot.ax gives the axis object
# plot.ax.patches gives list of bars that can be access using index starting at 0
for i, bar in enumerate(plot.ax.patches):
h = bar.get_height()
plot.ax.text(
i, # bar index (x coordinate of text)
h+10, # y coordinate of text
'{}'.format(int(h)), # y label
ha='center',
va='center',
fontweight='bold',
size=14)
The above answer from #nickil-maveli is simply great.
This is just to add some clarity about the parameters when you are adding the data labels to the barplot (as requested in the comments by #user27074)
# loop through all bars of the barplot
for nr, p in enumerate(ax.patches):
# height of bar, which is basically the data value
height = p.get_height()
# add text to specified position
ax.text(
# bar to which data label will be added
# so this is the x-coordinate of the data label
nr,
# height of data label: height / 2. is in the middle of the bar
# so this is the y-coordinate of the data label
height / 2.,
# formatting of data label
u'{:0.1f}%'.format(height),
# color of data label
color='black',
# size of data label
fontsize=18,
# horizontal alignment: possible values are center, right, left
ha='center',
# vertical alignment: possible values are top, bottom, center, baseline
va='center'
)

How to make a legend with matplotlib plotting a map?

I have the following plot:
Inside this map you see some random coloured areas with numbers in it. The numbers are 12,25,34,38 and 43. Now I want to add a legend in the above left corner with the numbers followed by the name of the area. Something like this:
The annotation (The numbers in the areas) are added though a for loop using the following command ax.annotate(number, xy = ...). Can somebody tell me how to add a legend with all the numbers and text in some sort of a legend similar to the image above?
The numbers and names are both inside a pandas dataframe.
fig, ax = plt.subplots(1,1)
fig.set_size_inches(8,8) # setting the size
# Plot values - with grey layout
grey_plot = schap.plot(ax = ax, color = 'grey')
schap.plot(ax = grey_plot, column= col1, cmap= 'YlGnBu', legend = True)
# Add annotation for every waterschap with a deelstroomgebied
bbox_props = dict(boxstyle="round", fc="w", ec="gray", alpha=0.9,lw=0.4)
for idx, row in schap.iterrows():
if not np.isnan(row[col1]):
string = str(idx)
ax.annotate(string, xy=row['coords'], color='black',
horizontalalignment='center', bbox=bbox_props, fontsize=7)
'schap'
is the pandas dataframe with all needed data. schap['text'] contains all names. In a for loop this would be row['text'].
After the loop you can simply add text, for example:
ax.text(0.01, 0.99, legend,
horizontalalignment='left',
verticalalignment='top',
transform=ax.transAxes,
fontsize=8)
where legend can be updated inside your loop (desc is column with description):
legend = ''
#...
#inside your loop
legend = legend + f"{idx} {row['text'])}\n"
EDIT:
Example with different data (and aligment of legend):
import geopandas
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1)
fig.set_size_inches(20,8)
world.plot(column='gdp_md_est', ax=ax, legend=True)
world['coords'] = world['geometry'].apply(lambda x: x.representative_point().coords[:][0])
bbox_props = dict(boxstyle="round", fc="w", ec="gray", alpha=0.9,lw=0.4)
legend = ''
for idx, row in world.iterrows():
if row['pop_est'] > 100_000_000:
plt.annotate(str(idx), xy=row['coords'], color='black',
horizontalalignment='center', bbox=bbox_props, fontsize=7)
legend = legend + f"{idx} {row['name']}\n"
ax.text(0.01, 0.5, legend,
horizontalalignment='left',
verticalalignment='center',
transform=ax.transAxes,
fontsize=8);

Pandas: plot a dataframe with on its right side rectangle colored according to an array's values

I have a dataframe with 100 rows and 4 columns. I have an array (size 100,1) filled with values spanning between 0 and 1. I would like to plot my dataframe, with on its right side a rectangle which will take a color depending on the value of the array at a specific row (see the poor drawing I made, the array is written to help understanding what I want). I would like the colors to be a gradient, where 0 = dark blue, and 1 = bright red.
I know how to create a colormap, but this is slightly different.
Which function do you advise me to use ?
Here is some code I use for the plotting:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
rectangle_values = np.random.rand(100)
plt.figure(figsize=(15,15))
ax = sns.heatmap(df, cbar = None)
)
My solution would be to use plot.subplots to create two plots with the width_ratios argument as something like 19:1. On the left hand side you plot the data frame as usual, on the right hand side you plot the vector. Notice that I am using vmin and vmax to set the boundaries as required (0, 1) for the vector. Also, for the requested colors, I'm using MatPlotLib's RdBu (Red and Blue map), but it was needed to reverse it in order to meet your requirements. You can confirm the colors by the values, on this run the generated random values were [0.74, 0.96, 0.87, 0.50, 0.26].
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
rectangle_values = pd.DataFrame(np.random.rand(5), columns=['foo'])
plt.subplots(1, 2, gridspec_kw={'width_ratios': [19, 1]})
plt.subplot(1, 2, 1)
sns.heatmap(df, cbar = None)
plt.subplot(1, 2, 2)
sns.heatmap(rectangle_values, cbar = None, cmap=plt.cm.get_cmap('RdBu').reversed(), vmin=0, vmax=1)
plt.show()
And the output is:

Align multi-line ticks in Seaborn plot

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?
I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

Labels in Plots

I am having some issues adding labels to the legend. For some reason matplotlib is ignoring the labels I create in the dataframe. Any help?
pandas version: 0.13.0
matplotlib version: 1.3.1
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
# Sample dataframe
d = {'date': [pd.to_datetime('1/1/2013'), pd.to_datetime('1/1/2014'), pd.to_datetime('1/1/2015')],
'number': [1,2,3],
'letter': ['A','B','C']}
df = pd.DataFrame(d)
####################
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(13, 10))
fig.subplots_adjust(hspace=2.0) ## Create space between plots
# Chart 1
df.plot(ax=axes[0], label='one')
# Chart 2
df.set_index('date')['number'].plot(ax=axes[1], label='two')
# add a little sugar
axes[0].set_title('This is the title')
axes[0].set_ylabel('the y axis')
axes[0].set_xlabel('the x axis')
axes[0].legend(loc='best')
axes[1].legend(loc='best');
The problem is that Chart 1 is returning the legend as "number" and I want it to say "one".
Will illustrate this for first axis. You may repeat for the second.
In [72]: fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(13, 10))
Get a reference to the axis
In [73]: ax=df.plot(ax=axes[0])
Get the legend
In [74]: legend = ax.get_legend()
Get the text of the legend
In [75]: text = legend.get_texts()[0]
Printing the current text of the legend
In [77]: text.get_text()
Out[77]: u'number'
Setting the desired text
In [78]: text.set_text("one")
Drawing to update
In [79]: plt.draw()
The following figure shows the changed legend for first axis. You may do the same for the other axis.
NB: IPython autocomplete helped a lot to figure out this answer!