How to create a heatmap with marginal histograms, similar to a jointplot? - matplotlib

I want to plot 2-dimensional scalar data, which I would usually plot using matplotlib.pyplot.imshow or sns.heatmap. Consider this example:
data = [[10, 20, 30], [50, 50, 100], [80, 60, 10]]
fix, ax = plt.subplots()
ax.imshow(data, cmap=plt.cm.YlGn)
Now I additionally would like to have one-dimonsional bar plots at the top and the right side, showing the sum of the values in each column / row - just as sns.jointplot does. However, sns.jointplot seems only to work with categorical data, producing histograms (with kind='hist'), scatterplots or the like - I don't see how to use it if I want to specify the values of the cells directly. Is such a thing possible with seaborn?
The y axis in my plot is going to be days (within a month), the x axis is going to be hours. My data looks like this:
The field Cost Difference is what should make up the shade of the respective field in the plot.

Here is an approach that first creates a dummy jointplot and then uses its axes to add a heatmap and bar plots of the sums.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
D = 28
H = 24
df = pd.DataFrame({'day': np.repeat(range(1, D + 1), H),
'hour': np.tile(range(H), D),
'Cost Dif.': np.random.uniform(10, 1000, D * H)})
# change the random df to have some rows/columns stand out (debugging, checking)
df.loc[df['hour'] == 10, 'Cost Dif.'] = 150
df.loc[df['hour'] == 12, 'Cost Dif.'] = 250
df.loc[df['day'] == 20, 'Cost Dif.'] = 800
g = sns.jointplot(data=df, x='day', y='hour', kind='hist', bins=(D, H))
g.ax_marg_y.cla()
g.ax_marg_x.cla()
sns.heatmap(data=df['Cost Dif.'].to_numpy().reshape(D, H).T, ax=g.ax_joint, cbar=False, cmap='Blues')
g.ax_marg_y.barh(np.arange(0.5, H), df.groupby(['hour'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_marg_x.bar(np.arange(0.5, D), df.groupby(['day'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_joint.set_xticks(np.arange(0.5, D))
g.ax_joint.set_xticklabels(range(1, D + 1), rotation=0)
g.ax_joint.set_yticks(np.arange(0.5, H))
g.ax_joint.set_yticklabels(range(H), rotation=0)
# remove ticks between heatmao and histograms
g.ax_marg_x.tick_params(axis='x', bottom=False, labelbottom=False)
g.ax_marg_y.tick_params(axis='y', left=False, labelleft=False)
# remove ticks showing the heights of the histograms
g.ax_marg_x.tick_params(axis='y', left=False, labelleft=False)
g.ax_marg_y.tick_params(axis='x', bottom=False, labelbottom=False)
g.fig.set_size_inches(20, 8) # jointplot creates its own figure, the size can only be changed afterwards
# g.fig.subplots_adjust(hspace=0.3) # optionally more space for the tick labels
g.fig.subplots_adjust(hspace=0.05, wspace=0.02) # less spaced needed when there are no tick labels
plt.show()

Related

How to adjust x-axis spacing on distplot? [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?
You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()
Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()
I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.
In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])
if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact
This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)
This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.
I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()
Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).
Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.
xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5
Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.
You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

Pandas: plot a dataframe with on its right side rectangle colored according to an array's values

I have a dataframe with 100 rows and 4 columns. I have an array (size 100,1) filled with values spanning between 0 and 1. I would like to plot my dataframe, with on its right side a rectangle which will take a color depending on the value of the array at a specific row (see the poor drawing I made, the array is written to help understanding what I want). I would like the colors to be a gradient, where 0 = dark blue, and 1 = bright red.
I know how to create a colormap, but this is slightly different.
Which function do you advise me to use ?
Here is some code I use for the plotting:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
rectangle_values = np.random.rand(100)
plt.figure(figsize=(15,15))
ax = sns.heatmap(df, cbar = None)
)
My solution would be to use plot.subplots to create two plots with the width_ratios argument as something like 19:1. On the left hand side you plot the data frame as usual, on the right hand side you plot the vector. Notice that I am using vmin and vmax to set the boundaries as required (0, 1) for the vector. Also, for the requested colors, I'm using MatPlotLib's RdBu (Red and Blue map), but it was needed to reverse it in order to meet your requirements. You can confirm the colors by the values, on this run the generated random values were [0.74, 0.96, 0.87, 0.50, 0.26].
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
rectangle_values = pd.DataFrame(np.random.rand(5), columns=['foo'])
plt.subplots(1, 2, gridspec_kw={'width_ratios': [19, 1]})
plt.subplot(1, 2, 1)
sns.heatmap(df, cbar = None)
plt.subplot(1, 2, 2)
sns.heatmap(rectangle_values, cbar = None, cmap=plt.cm.get_cmap('RdBu').reversed(), vmin=0, vmax=1)
plt.show()
And the output is:

how to increase space between bar and increase bar width in matplotlib

i am web-scraping a wikipedia table directly from wikipedia website and plot the table. i want to increase the bar width, add space between the bars and make all bars visible. pls how can i do? my code below
#########scrapping#########
html= requests.get("https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria")
bsObj= BeautifulSoup(html.content, 'html.parser')
states= []
cases=[]
for items in bsObj.find("table",{"class":"wikitable sortable"}).find_all('tr')[1:37]:
data = items.find_all(['th',{"align":"left"},'td'])
states.append(data[0].a.text)
cases.append(data[1].b.text)
########Dataframe#########
table= ["STATES","CASES"]
tab= pd.DataFrame(list(zip(states,cases)),columns=table)
tab["CASES"]=tab["CASES"].replace('\n','', regex=True)
tab["CASES"]=tab["CASES"].replace(',','', regex=True)
tab['CASES'] = pd.to_numeric(tab['CASES'], errors='coerce')
tab["CASES"]=tab["CASES"].fillna(0)
tab["CASES"] = tab["CASES"].values.astype(int)
#######matplotlib########
x=tab["STATES"]
y=tab["CASES"]
plt.cla()
plt.locator_params(axis='y', nbins=len(y)/4)
plt.bar(x,y, color="blue")
plt.xticks(fontsize= 8,rotation='vertical')
plt.yticks(fontsize= 8)
plt.show()
Use pandas.read_html and barh
.read_html will read all tables tags from a website and return a list of dataframes.
barh will make horizontal instead of vertical bars, which is useful if there are a lot of bars.
Make the plot longer, if needed. In this case, (16.0, 10.0), increase 10.
I'd recommend using a log scale for x, because Lagos has so many cases compared to Kogi
This doesn't put more space between the bars, but the formatted plot is more legible with its increased dimensions and horizontal bars.
.iloc[:36, :5] removes some unneeded columns and rows from the dataframe.
import pandas as pd
import matplotlib.pyplot as plt
# url
url = 'https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria'
# create dataframe list
dataframe_list = pd.read_html(url) # this is a list of all the tables at the url as dataframes
# get the dataframe from the list
df = dataframe_list[2].iloc[:36, :5] # you want the dataframe at index 2
# replace '-' with 0
df.replace('–', 0, inplace=True)
# set to int
for col in df.columns[1:]:
df[col] = df[col].astype('int')
# plot a horizontal bar
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')
p = plt.barh(width='Cases', y='State', data=df, color='purple')
plt.xscale('log')
plt.xlabel('Number of Cases')
plt.show()
Plot all the data in df
df.set_index('State', inplace=True)
plt.figure(figsize=(14, 14))
df.plot.barh()
plt.xscale('log')
plt.show()
4 subplots
State as index
plt.figure(figsize=(14, 14))
for i, col in enumerate(df.columns, 1):
plt.subplot(2, 2, i)
df[col].plot.barh(label=col, color='green')
plt.xscale('log')
plt.legend()
plt.tight_layout()
plt.show()

Matplotlib plot labels overlap [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?
You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()
Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()
I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.
In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])
if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact
This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)
This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.
I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()
Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).
Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.
xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5
Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.
You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

How to color bars who make up 50% of the data?

I am plotting a histogram for some data points with bar heights being the percentage of that bin from the whole data:
x = normal(size=1000)
hist, bins = np.histogram(x, bins=20)
plt.bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), alpha=0.6)
The result is:
I would like all bars that sum up to be 50% of the data to be in a different color, for example:
(I selected the colored bars without actually checking whether their sum adds to 50%)
Any suggestions how to accomplish this?
Here is how you can plot the first half of the bins with a different color, this looks like your mock, but I am not sure it complies to %50 of the data (it is not clear to me what do you mean by that).
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
fig = plt.figure()
ax = fig.add_subplot(111)
# the histogram of the data
n, bins, patches = ax.hist(x, 50, normed=1, facecolor='green', alpha=0.75)
# now that we found the index we color all the beans smaller than middle index
for p in patches[:len(bins)/2]:
p.set_facecolor('red')
# hist uses np.histogram under the hood to create 'n' and 'bins'.
# np.histogram returns the bin edges, so there will be 50 probability
# density values in n, 51 bin edges in bins and 50 patches. To get
# everything lined up, we'll compute the bin centers
bincenters = 0.5*(bins[1:]+bins[:-1])
# add a 'best fit' line for the normal PDF
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
ax.set_xlabel('Smarts')
ax.set_ylabel('Probability')
ax.set_xlim(40, 160)
ax.set_ylim(0, 0.03)
ax.grid(True)
plt.show()
And the output is:
update
The key method you want to look at is patch.set_set_facecolor. You have to understand that almost everything you plot inside the axes object is a Patch, and as such it has this method, here is another example, I arbitrary choose the first 3 bars to have another color, you can choose based on what ever you decide:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
## the data
N = 5
menMeans = [18, 35, 30, 35, 27]
## necessary variables
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars
## the bars
rects1 = ax.bar(ind, menMeans, width,
color='black',
error_kw=dict(elinewidth=2,ecolor='red'))
for patch in rects1.patches[:3]:
patch.set_facecolor('red')
ax.set_xlim(-width,len(ind)+width)
ax.set_ylim(0,45)
ax.set_ylabel('Scores')
xTickMarks = ['Group'+str(i) for i in range(1,6)]
ax.set_xticks(ind)
xtickNames = ax.set_xticklabels(xTickMarks)
plt.setp(xtickNames, rotation=45, fontsize=10)
plt.show()