Replicating O*NET OnLine charts with matplotlib

Replicating O*NET OnLine charts with matplotlib - matplotlib

Given the 10%, 25%, Median, 75%, and 90% income levels for a given position, I'd like to replicate the annual wage chart style from O*NET for an internal document.
Example chart: https://www.onetonline.org/link/localwages/41-1011.00?st=OH
Since the main feature I want is a smoother chart, I attempted to interpolate, but it results in odd behavior -- the axis dips negative and then back up, which is definitely not my intent.
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
y = [10, 25, 50, 25, 10]
x = [25150, 29990, 37910, 49160, 70160]
x_a = np.array(x)
y_a = np.array(y)
x_ = np.linspace(x_a.min(), x_a.max(), 50)
spline = make_interp_spline(x, y)
y_ = spline(x_)
plt.plot(x_, y_)
plt.show()

Related

How to adjust x-axis spacing on distplot? [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?

You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()

Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()

I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.

In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])

if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact

This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)

This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.

I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()

Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).

Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.

xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5

Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.

You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

How to enlarge a Matplotlib group bar diagram

I have a group bar chart that I would like to scale.
I am running matplotlib in a Jupyter Notebook and the bar chart is very squashed. I would like to make the axis bigger but can't get it to work in a group bar chart. If I could make it wider it would be much more readable. But if I just increase "width" then the bars start to overlap each other.
The second problem is what to do about the labels. How can the labels be printed to three decimal places?
Note: I recognise that the the values plotted are orders of magnitude different so you cannot really read the small values. Ordinarily you would not combine these onto a single chart - but this is a class exercise to demonstrating why you would not do it so I expect that.
Here is the self-contained code to demonstrate the problem:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79', '80-89', '90-99']
t3=[1.2833333333333332, 1.6970588235294117, 1.7189655172413794, 1.8090163934426229, 1.44140625, 1.5763157894736846, 1.3685185185185187, 1.430120481927711, 1.5352941176470587, 1.9]
tt4= [116.33333333333333, 106.0, 106.93103448275862, 109.47540983606558, 98.734375, 99.84210526315789, 96.72839506172839, 99.40963855421687, 104.94117647058823, 203.0]
tsh= [1.2833333333333332, 1.6970588235294117, 1.7189655172413794, 1.8090163934426229, 1.44140625, 1.5763157894736846, 1.3685185185185187, 1.430120481927711, 1.5352941176470587, 1.9]
hypo_count= [2, 15, 55, 58, 59, 69, 72, 74, 33, 1]
x = np.arange(len(labels)) # the label locations
width = 0.2 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x, t3, width, label='T3 avg')
rects2 = ax.bar(x+(width), tt4, width, label='TT4 avg')
rects3 = ax.bar(x+(width*2), tsh, width, label='TSH avg')
rects4 = ax.bar(x+(width*3), hypo_count, width, label='# Hypothyroid +ve')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_title('Age Bracket')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
# Print the value on top of each bar
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
ax.bar_label(rects3, padding=3)
ax.bar_label(rects4, padding=3)
fig.tight_layout()
plt.show()

How to create a heatmap with marginal histograms, similar to a jointplot?

I want to plot 2-dimensional scalar data, which I would usually plot using matplotlib.pyplot.imshow or sns.heatmap. Consider this example:
data = [[10, 20, 30], [50, 50, 100], [80, 60, 10]]
fix, ax = plt.subplots()
ax.imshow(data, cmap=plt.cm.YlGn)
Now I additionally would like to have one-dimonsional bar plots at the top and the right side, showing the sum of the values in each column / row - just as sns.jointplot does. However, sns.jointplot seems only to work with categorical data, producing histograms (with kind='hist'), scatterplots or the like - I don't see how to use it if I want to specify the values of the cells directly. Is such a thing possible with seaborn?
The y axis in my plot is going to be days (within a month), the x axis is going to be hours. My data looks like this:
The field Cost Difference is what should make up the shade of the respective field in the plot.

Here is an approach that first creates a dummy jointplot and then uses its axes to add a heatmap and bar plots of the sums.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
D = 28
H = 24
df = pd.DataFrame({'day': np.repeat(range(1, D + 1), H),
'hour': np.tile(range(H), D),
'Cost Dif.': np.random.uniform(10, 1000, D * H)})
# change the random df to have some rows/columns stand out (debugging, checking)
df.loc[df['hour'] == 10, 'Cost Dif.'] = 150
df.loc[df['hour'] == 12, 'Cost Dif.'] = 250
df.loc[df['day'] == 20, 'Cost Dif.'] = 800
g = sns.jointplot(data=df, x='day', y='hour', kind='hist', bins=(D, H))
g.ax_marg_y.cla()
g.ax_marg_x.cla()
sns.heatmap(data=df['Cost Dif.'].to_numpy().reshape(D, H).T, ax=g.ax_joint, cbar=False, cmap='Blues')
g.ax_marg_y.barh(np.arange(0.5, H), df.groupby(['hour'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_marg_x.bar(np.arange(0.5, D), df.groupby(['day'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_joint.set_xticks(np.arange(0.5, D))
g.ax_joint.set_xticklabels(range(1, D + 1), rotation=0)
g.ax_joint.set_yticks(np.arange(0.5, H))
g.ax_joint.set_yticklabels(range(H), rotation=0)
# remove ticks between heatmao and histograms
g.ax_marg_x.tick_params(axis='x', bottom=False, labelbottom=False)
g.ax_marg_y.tick_params(axis='y', left=False, labelleft=False)
# remove ticks showing the heights of the histograms
g.ax_marg_x.tick_params(axis='y', left=False, labelleft=False)
g.ax_marg_y.tick_params(axis='x', bottom=False, labelbottom=False)
g.fig.set_size_inches(20, 8) # jointplot creates its own figure, the size can only be changed afterwards
# g.fig.subplots_adjust(hspace=0.3) # optionally more space for the tick labels
g.fig.subplots_adjust(hspace=0.05, wspace=0.02) # less spaced needed when there are no tick labels
plt.show()

Matplotlib plot labels overlap [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?

You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()

Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()

I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.

In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])

if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact

This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)

This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.

I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()

Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).

Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.

xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5

Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.

You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

How to color bars who make up 50% of the data?

I am plotting a histogram for some data points with bar heights being the percentage of that bin from the whole data:
x = normal(size=1000)
hist, bins = np.histogram(x, bins=20)
plt.bar(bins[:-1], hist.astype(np.float32) / hist.sum(), width=(bins[1]-bins[0]), alpha=0.6)
The result is:
I would like all bars that sum up to be 50% of the data to be in a different color, for example:
(I selected the colored bars without actually checking whether their sum adds to 50%)
Any suggestions how to accomplish this?

Here is how you can plot the first half of the bins with a different color, this looks like your mock, but I am not sure it complies to %50 of the data (it is not clear to me what do you mean by that).
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
fig = plt.figure()
ax = fig.add_subplot(111)
# the histogram of the data
n, bins, patches = ax.hist(x, 50, normed=1, facecolor='green', alpha=0.75)
# now that we found the index we color all the beans smaller than middle index
for p in patches[:len(bins)/2]:
p.set_facecolor('red')
# hist uses np.histogram under the hood to create 'n' and 'bins'.
# np.histogram returns the bin edges, so there will be 50 probability
# density values in n, 51 bin edges in bins and 50 patches. To get
# everything lined up, we'll compute the bin centers
bincenters = 0.5*(bins[1:]+bins[:-1])
# add a 'best fit' line for the normal PDF
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
ax.set_xlabel('Smarts')
ax.set_ylabel('Probability')
ax.set_xlim(40, 160)
ax.set_ylim(0, 0.03)
ax.grid(True)
plt.show()
And the output is:
update
The key method you want to look at is patch.set_set_facecolor. You have to understand that almost everything you plot inside the axes object is a Patch, and as such it has this method, here is another example, I arbitrary choose the first 3 bars to have another color, you can choose based on what ever you decide:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
## the data
N = 5
menMeans = [18, 35, 30, 35, 27]
## necessary variables
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars
## the bars
rects1 = ax.bar(ind, menMeans, width,
color='black',
error_kw=dict(elinewidth=2,ecolor='red'))
for patch in rects1.patches[:3]:
patch.set_facecolor('red')
ax.set_xlim(-width,len(ind)+width)
ax.set_ylim(0,45)
ax.set_ylabel('Scores')
xTickMarks = ['Group'+str(i) for i in range(1,6)]
ax.set_xticks(ind)
xtickNames = ax.set_xticklabels(xTickMarks)
plt.setp(xtickNames, rotation=45, fontsize=10)
plt.show()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Replicating O*NET OnLine charts with matplotlib - matplotlib

Related

How to adjust x-axis spacing on distplot? [duplicate]

How to enlarge a Matplotlib group bar diagram

How to create a heatmap with marginal histograms, similar to a jointplot?

Matplotlib plot labels overlap [duplicate]

How to color bars who make up 50% of the data?

Categories

Resources