Matplotlib/pyplot: Auto adjust unit of y Axis - matplotlib

I would like to modify the Y axis unit of the plot indicated below. Preferable would be the use of units like M (Million), k (Thousand) for large numbers. For example, the y Axis should look like: 50k, 100k, 150k, etc.
The plot below is generated by the following code snippet:
plt.autoscale(enable=True, axis='both')
plt.title("TTL Distribution")
plt.xlabel('TTL Value')
plt.ylabel('Number of Packets')
y = graphy # data from a sqlite query
x = graphx # data from a sqlite query
width = 0.5
plt.bar(x, y, width, align='center', linewidth=2, color='red', edgecolor='red')
fig = plt.gcf()
plt.show()
I saw this post and thought I could write my own formatting function:
def y_fmt(x, y):
if max_y > 1000000:
val = int(y)/1000000
return '{:d} M'.format(val)
elif max_y > 1000:
val = int(y) / 1000
return '{:d} k'.format(val)
else:
return y
But I missed that there is no plt.yaxis.set_major_formatter(tick.FuncFormatter(y_fmt)) function available for the bar plot I am using.
How I can achieve a better formatting of the Y axis?
[]

In principle there is always the option to set custom labels via plt.gca().yaxis.set_xticklabels().
However, I'm not sure why there shouldn't be the possibility to use matplotlib.ticker.FuncFormatter here. The FuncFormatter is designed for exactly the purpose of providing custom ticklabels depending on the ticklabel's position and value.
There is actually a nice example in the matplotlib example collection.
In this case we can use the FuncFormatter as desired to provide unit prefixes as suffixes on the axes of a matplotlib plot. To this end, we iterate over the multiples of 1000 and check if the value to be formatted exceeds it. If the value is then a whole number, we can format it as integer with the respective unit symbol as suffix. On the other hand, if there is a remainder behind the decimal point, we check how many decimal places are needed to format this number.
Here is a complete example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
def y_fmt(y, pos):
decades = [1e9, 1e6, 1e3, 1e0, 1e-3, 1e-6, 1e-9 ]
suffix = ["G", "M", "k", "" , "m" , "u", "n" ]
if y == 0:
return str(0)
for i, d in enumerate(decades):
if np.abs(y) >=d:
val = y/float(d)
signf = len(str(val).split(".")[1])
if signf == 0:
return '{val:d} {suffix}'.format(val=int(val), suffix=suffix[i])
else:
if signf == 1:
print val, signf
if str(val).split(".")[1] == "0":
return '{val:d} {suffix}'.format(val=int(round(val)), suffix=suffix[i])
tx = "{"+"val:.{signf}f".format(signf = signf) +"} {suffix}"
return tx.format(val=val, suffix=suffix[i])
#return y
return y
fig, ax = plt.subplots(ncols=3, figsize=(10,5))
x = np.linspace(0,349,num=350)
y = np.sinc((x-66.)/10.3)**2*1.5e6+np.sinc((x-164.)/8.7)**2*660000.+np.random.rand(len(x))*76000.
width = 1
ax[0].bar(x, y, width, align='center', linewidth=2, color='red', edgecolor='red')
ax[0].yaxis.set_major_formatter(FuncFormatter(y_fmt))
ax[1].bar(x[::-1], y*(-0.8e-9), width, align='center', linewidth=2, color='orange', edgecolor='orange')
ax[1].yaxis.set_major_formatter(FuncFormatter(y_fmt))
ax[2].fill_between(x, np.sin(x/100.)*1.7+100010, np.cos(x/100.)*1.7+100010, linewidth=2, color='#a80975', edgecolor='#a80975')
ax[2].yaxis.set_major_formatter(FuncFormatter(y_fmt))
for axes in ax:
axes.set_title("TTL Distribution")
axes.set_xlabel('TTL Value')
axes.set_ylabel('Number of Packets')
axes.set_xlim([x[0], x[-1]+1])
plt.show()
which provides the following plot:

You were pretty close; one (possibly) confusing thing about FuncFormatter is that the first argument is the tick value, and the second the tick position , which (when named x,y) can be confusing for the y-axis. For clarity, I renamed them in the example below.
The function should take in two inputs (tick value x and position pos) and return a string
(http://matplotlib.org/api/ticker_api.html#matplotlib.ticker.FuncFormatter)
Working example:
import numpy as np
import matplotlib.pylab as pl
import matplotlib.ticker as tick
def y_fmt(tick_val, pos):
if tick_val > 1000000:
val = int(tick_val)/1000000
return '{:d} M'.format(val)
elif tick_val > 1000:
val = int(tick_val) / 1000
return '{:d} k'.format(val)
else:
return tick_val
x = np.arange(300)
y = np.random.randint(0,2000000,x.size)
width = 0.5
pl.bar(x, y, width, align='center', linewidth=2, color='red', edgecolor='red')
pl.xlim(0,300)
ax = pl.gca()
ax.yaxis.set_major_formatter(tick.FuncFormatter(y_fmt))

Related

Adding secondary axis and making grid lines equal

I am trying to add a secondary axis to a plot and make the grid lines equally spaced along y, but I the code below doesn't do what it is supposed to. y2A,y2B values are not right - they refer to xlim values not ylim. Any ideas?
import numpy as np
import matplotlib.pyplot as plt
def CtoF(y):
return y * 1.8 + 32
def FtoC(y):
return (y - 32) / 1.8
def setAxis2(ax1):
ax2 = ax1.secondary_yaxis('right', functions=(CtoF, FtoC))
ax2.set_ylabel('Fahrenheit')
return ax2
x = np.arange(100)
y = np.random.rand(100)
plt.plot(x,y)
ax1 = plt.gca()
ax1.set_ylabel('Celsius')
ax1.grid()
#Add the 2nd axis for Fahrenheit
ax2 = setAxis2(ax1)
#Get the ylimits and space them equally
[y1A,y1B] = ax1.get_ylim()
[y2A,y2B] = ax2.get_ylim()
ax1.set_yticks(np.linspace(y1A,y1B, 10))
ax2.set_yticks(np.linspace(y2A,y2B, 10)) #Doesn't work
print(y1A,y1B) #
print(y2A,y2B) #Doesn't output the expected values
I tried another method that works well (with the same versions of matplotlib), but the question remains about the issue above. The method that works is below:
ticks1 = ax1.get_yticks()
ticks2 = CtoF(ticks1)
ax2.set_yticks(ticks2)
Instead of getting y2A and y2B from the y-limits of ax2, we can calculate them directly with CtoF:
# Get the y-limits and space them equally.
y1A, y1B = ax1.get_ylim()
y2A, y2B = map(CtoF, (y1A, y1B))
n = 10
ax1.set_yticks(np.linspace(y1A, y1B, n))
ax2.set_yticks(np.linspace(y2A, y2B, n))

How to show precentage in Seaborn countplot [duplicate]

I was wondering if it is possible to create a Seaborn count plot, but instead of actual counts on the y-axis, show the relative frequency (percentage) within its group (as specified with the hue parameter).
I sort of fixed this with the following approach, but I can't imagine this is the easiest approach:
# Plot percentage of occupation per income class
grouped = df.groupby(['income'], sort=False)
occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)
occupation_data = [
{'occupation': occupation, 'income': income, 'percentage': percentage*100} for
(income, occupation), percentage in dict(occupation_counts).items()
]
df_occupation = pd.DataFrame(occupation_data)
p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation)
_ = plt.setp(p.get_xticklabels(), rotation=90) # Rotate labels
Result:
I'm using the well known adult data set from the UCI machine learning repository. The pandas dataframe is created like this:
# Read the adult dataset
df = pd.read_csv(
"data/adult.data",
engine='c',
lineterminator='\n',
names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'race', 'sex',
'capital_gain', 'capital_loss', 'hours_per_week',
'native_country', 'income'],
header=None,
skipinitialspace=True,
na_values="?"
)
This question is sort of related, but does not make use of the hue parameter. And in my case I cannot just change the labels on the y-axis, because the height of the bar must depend on the group.
With newer versions of seaborn you can do following:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
df = sns.load_dataset('titanic')
df.head()
x,y = 'class', 'survived'
(df
.groupby(x)[y]
.value_counts(normalize=True)
.mul(100)
.rename('percent')
.reset_index()
.pipe((sns.catplot,'data'), x=x,y='percent',hue=y,kind='bar'))
output
Update: Also show percentages on top of barplots
If you also want percentages, you can do following:
import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
df.head()
x,y = 'class', 'survived'
df1 = df.groupby(x)[y].value_counts(normalize=True)
df1 = df1.mul(100)
df1 = df1.rename('percent').reset_index()
g = sns.catplot(x=x,y='percent',hue=y,kind='bar',data=df1)
g.ax.set_ylim(0,100)
for p in g.ax.patches:
txt = str(p.get_height().round(2)) + '%'
txt_x = p.get_x()
txt_y = p.get_height()
g.ax.text(txt_x,txt_y,txt)
I might be confused. The difference between your output and the output of
occupation_counts = (df.groupby(['income'])['occupation']
.value_counts(normalize=True)
.rename('percentage')
.mul(100)
.reset_index()
.sort_values('occupation'))
p = sns.barplot(x="occupation", y="percentage", hue="income", data=occupation_counts)
_ = plt.setp(p.get_xticklabels(), rotation=90) # Rotate labels
is, it seems to me, only the order of the columns.
And you seem to care about that, since you pass sort=False. But then, in your code the order is determined uniquely by chance (and the order in which the dictionary is iterated even changes from run to run with Python 3.5).
You could do this with sns.histplot by setting the following properties:
stat = 'density' (this will make the y-axis the density rather than count)
common_norm = False (this will normalize each density independently)
See the simple example below:
import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
ax = sns.histplot(x = df['class'], hue=df['survived'], multiple="dodge",
stat = 'density', shrink = 0.8, common_norm=False)
You can use the library Dexplot to do counting as well as normalizing over any variable to get relative frequencies.
Pass the count function the name of the variable you would like to count and it will automatically produce a bar plot of the counts of all unique values. Use split to subdivide the counts by another variable. Notice that Dexplot automatically wraps the x-tick labels.
dxp.count('occupation', data=df, split='income')
Use the normalize parameter to normalize the counts over any variable (or combination of variables with a list). You can also use True to normalize over the grand total of counts.
dxp.count('occupation', data=df, split='income', normalize='income')
It boggled my mind that Seaborn doesn't provide anything like this out of the box.
Still, it was pretty easy to tweak the source code to get what you wanted.
The following code, with the function "percentageplot(x, hue, data)" works just like sns.countplot, but norms each bar per group (i.e. divides each green bar's value by the sum of all green bars)
In effect, it turns this (hard to interpret because different N of Apple vs. Android):
sns.countplot
into this (Normed so that bars reflect proportion of total for Apple, vs Android):
Percentageplot
Hope this helps!!
from seaborn.categorical import _CategoricalPlotter, remove_na
import matplotlib as mpl
class _CategoricalStatPlotter(_CategoricalPlotter):
#property
def nested_width(self):
"""A float with the width of plot elements when hue nesting is used."""
return self.width / len(self.hue_names)
def estimate_statistic(self, estimator, ci, n_boot):
if self.hue_names is None:
statistic = []
confint = []
else:
statistic = [[] for _ in self.plot_data]
confint = [[] for _ in self.plot_data]
for i, group_data in enumerate(self.plot_data):
# Option 1: we have a single layer of grouping
# --------------------------------------------
if self.plot_hues is None:
if self.plot_units is None:
stat_data = remove_na(group_data)
unit_data = None
else:
unit_data = self.plot_units[i]
have = pd.notnull(np.c_[group_data, unit_data]).all(axis=1)
stat_data = group_data[have]
unit_data = unit_data[have]
# Estimate a statistic from the vector of data
if not stat_data.size:
statistic.append(np.nan)
else:
statistic.append(estimator(stat_data, len(np.concatenate(self.plot_data))))
# Get a confidence interval for this estimate
if ci is not None:
if stat_data.size < 2:
confint.append([np.nan, np.nan])
continue
boots = bootstrap(stat_data, func=estimator,
n_boot=n_boot,
units=unit_data)
confint.append(utils.ci(boots, ci))
# Option 2: we are grouping by a hue layer
# ----------------------------------------
else:
for j, hue_level in enumerate(self.hue_names):
if not self.plot_hues[i].size:
statistic[i].append(np.nan)
if ci is not None:
confint[i].append((np.nan, np.nan))
continue
hue_mask = self.plot_hues[i] == hue_level
group_total_n = (np.concatenate(self.plot_hues) == hue_level).sum()
if self.plot_units is None:
stat_data = remove_na(group_data[hue_mask])
unit_data = None
else:
group_units = self.plot_units[i]
have = pd.notnull(
np.c_[group_data, group_units]
).all(axis=1)
stat_data = group_data[hue_mask & have]
unit_data = group_units[hue_mask & have]
# Estimate a statistic from the vector of data
if not stat_data.size:
statistic[i].append(np.nan)
else:
statistic[i].append(estimator(stat_data, group_total_n))
# Get a confidence interval for this estimate
if ci is not None:
if stat_data.size < 2:
confint[i].append([np.nan, np.nan])
continue
boots = bootstrap(stat_data, func=estimator,
n_boot=n_boot,
units=unit_data)
confint[i].append(utils.ci(boots, ci))
# Save the resulting values for plotting
self.statistic = np.array(statistic)
self.confint = np.array(confint)
# Rename the value label to reflect the estimation
if self.value_label is not None:
self.value_label = "{}({})".format(estimator.__name__,
self.value_label)
def draw_confints(self, ax, at_group, confint, colors,
errwidth=None, capsize=None, **kws):
if errwidth is not None:
kws.setdefault("lw", errwidth)
else:
kws.setdefault("lw", mpl.rcParams["lines.linewidth"] * 1.8)
for at, (ci_low, ci_high), color in zip(at_group,
confint,
colors):
if self.orient == "v":
ax.plot([at, at], [ci_low, ci_high], color=color, **kws)
if capsize is not None:
ax.plot([at - capsize / 2, at + capsize / 2],
[ci_low, ci_low], color=color, **kws)
ax.plot([at - capsize / 2, at + capsize / 2],
[ci_high, ci_high], color=color, **kws)
else:
ax.plot([ci_low, ci_high], [at, at], color=color, **kws)
if capsize is not None:
ax.plot([ci_low, ci_low],
[at - capsize / 2, at + capsize / 2],
color=color, **kws)
ax.plot([ci_high, ci_high],
[at - capsize / 2, at + capsize / 2],
color=color, **kws)
class _BarPlotter(_CategoricalStatPlotter):
"""Show point estimates and confidence intervals with bars."""
def __init__(self, x, y, hue, data, order, hue_order,
estimator, ci, n_boot, units,
orient, color, palette, saturation, errcolor, errwidth=None,
capsize=None):
"""Initialize the plotter."""
self.establish_variables(x, y, hue, data, orient,
order, hue_order, units)
self.establish_colors(color, palette, saturation)
self.estimate_statistic(estimator, ci, n_boot)
self.errcolor = errcolor
self.errwidth = errwidth
self.capsize = capsize
def draw_bars(self, ax, kws):
"""Draw the bars onto `ax`."""
# Get the right matplotlib function depending on the orientation
barfunc = ax.bar if self.orient == "v" else ax.barh
barpos = np.arange(len(self.statistic))
if self.plot_hues is None:
# Draw the bars
barfunc(barpos, self.statistic, self.width,
color=self.colors, align="center", **kws)
# Draw the confidence intervals
errcolors = [self.errcolor] * len(barpos)
self.draw_confints(ax,
barpos,
self.confint,
errcolors,
self.errwidth,
self.capsize)
else:
for j, hue_level in enumerate(self.hue_names):
# Draw the bars
offpos = barpos + self.hue_offsets[j]
barfunc(offpos, self.statistic[:, j], self.nested_width,
color=self.colors[j], align="center",
label=hue_level, **kws)
# Draw the confidence intervals
if self.confint.size:
confint = self.confint[:, j]
errcolors = [self.errcolor] * len(offpos)
self.draw_confints(ax,
offpos,
confint,
errcolors,
self.errwidth,
self.capsize)
def plot(self, ax, bar_kws):
"""Make the plot."""
self.draw_bars(ax, bar_kws)
self.annotate_axes(ax)
if self.orient == "h":
ax.invert_yaxis()
def percentageplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
orient=None, color=None, palette=None, saturation=.75,
ax=None, **kwargs):
# Estimator calculates required statistic (proportion)
estimator = lambda x, y: (float(len(x))/y)*100
ci = None
n_boot = 0
units = None
errcolor = None
if x is None and y is not None:
orient = "h"
x = y
elif y is None and x is not None:
orient = "v"
y = x
elif x is not None and y is not None:
raise TypeError("Cannot pass values for both `x` and `y`")
else:
raise TypeError("Must pass values for either `x` or `y`")
plotter = _BarPlotter(x, y, hue, data, order, hue_order,
estimator, ci, n_boot, units,
orient, color, palette, saturation,
errcolor)
plotter.value_label = "Percentage"
if ax is None:
ax = plt.gca()
plotter.plot(ax, kwargs)
return ax
You can provide estimators for the height of the bar (along y axis) in a seaborn countplot by using the estimator keyword.
ax = sns.barplot(x="x", y="x", data=df, estimator=lambda x: len(x) / len(df) * 100)
The above code snippet is from https://github.com/mwaskom/seaborn/issues/1027
They have a whole discussion about how to provide percentages in a countplot. This answer is based off the same thread linked above.
In the context of your specific problem, you can probably do something like this:
ax = sb.barplot(x='occupation', y='some_numeric_column', data=raw_data, estimator=lambda x: len(x) / len(raw_data) * 100, hue='income')
ax.set(ylabel="Percent")
The above code worked for me (on a different dataset with different attributes). Note that you need to put in some numeric column for y else, it gives an error: "ValueError: Neither the x nor y variable appears to be numeric."
From this answer, and using "probability" worked best.
Taken from sns.histplot documentation on the "stat" parameter:
Aggregate statistic to compute in each bin.
count: show the number of observations in each bin
frequency: show the number of observations divided by the bin width
probability: or proportion: normalize such that bar heights sum to 1
percent: normalize such that bar heights sum to 100
density: normalize such that the total area of the histogram equals 1
import seaborn as sns
df = sns.load_dataset('titanic')
ax = sns.histplot(
x = df['class'],
hue=df['survived'],
multiple="dodge",
stat = 'probability',
shrink = 0.5,
common_norm=False
)

Matplotlib: different scale on negative side of the axis

Background
I am trying to show three variables on a single plot. I have connected the three points using lines of different colours based on some other variables. This is shown here
Problem
What I want to do is to have a different scale on the negative x-axis. This would help me in providing positive x_ticks, different axis label and also clear and uncluttered representation of the lines on left side of the image
Question
How to have a different positive x-axis starting from 0 towards negative direction?
Have xticks based on data plotted in that direction
Have a separate xlabel for this new axis
Additional information
I have checked other questions regarding inclusion of multiple axes e.g. this and this. However, these questions did not serve the purpose.
Code Used
font_size = 20
plt.rcParams.update({'font.size': font_size})
fig = plt.figure()
ax = fig.add_subplot(111)
#read my_data from file or create it
for case in my_data:
#Iterating over my_data
if condition1 == True:
local_linestyle = '-'
local_color = 'r'
local_line_alpha = 0.6
elif condition2 == 1:
local_linestyle = '-'
local_color = 'b'
local_line_alpha = 0.6
else:
local_linestyle = '--'
local_color = 'g'
local_line_alpha = 0.6
datapoint = [case[0], case[1], case[2]]
plt.plot(datapoint[0], 0, color=local_color)
plt.plot(-datapoint[2], 0, color=local_color)
plt.plot(0, datapoint[1], color=local_color)
plt.plot([datapoint[0], 0], [0, datapoint[1]], linestyle=local_linestyle, color=local_color)
plt.plot([-datapoint[2], 0], [0, datapoint[1]], linestyle=local_linestyle, color=local_color)
plt.show()
exit()
You can define a custom scale, where values below zero are scaled differently than those above zero.
import numpy as np
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import FuncFormatter
class AsymScale(mscale.ScaleBase):
name = 'asym'
def __init__(self, axis, **kwargs):
mscale.ScaleBase.__init__(self)
self.a = kwargs.get("a", 1)
def get_transform(self):
return self.AsymTrans(self.a)
def set_default_locators_and_formatters(self, axis):
# possibly, set a different locator and formatter here.
fmt = lambda x,pos: "{}".format(np.abs(x))
axis.set_major_formatter(FuncFormatter(fmt))
class AsymTrans(mtransforms.Transform):
input_dims = 1
output_dims = 1
is_separable = True
def __init__(self, a):
mtransforms.Transform.__init__(self)
self.a = a
def transform_non_affine(self, x):
return (x >= 0)*x + (x < 0)*x*self.a
def inverted(self):
return AsymScale.InvertedAsymTrans(self.a)
class InvertedAsymTrans(AsymTrans):
def transform_non_affine(self, x):
return (x >= 0)*x + (x < 0)*x/self.a
def inverted(self):
return AsymScale.AsymTrans(self.a)
Using this you would provide a scale parameter a that scales the negative part of the axes.
# Now that the Scale class has been defined, it must be registered so
# that ``matplotlib`` can find it.
mscale.register_scale(AsymScale)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([-2, 0, 5], [0,1,0])
ax.set_xscale("asym", a=2)
ax.annotate("negative axis", xy=(.25,0), xytext=(0,-30),
xycoords = "axes fraction", textcoords="offset points", ha="center")
ax.annotate("positive axis", xy=(.75,0), xytext=(0,-30),
xycoords = "axes fraction", textcoords="offset points", ha="center")
plt.show()
The question is not very clear about what xticks and labels are desired, so I left that out for now.
Here's how to get what you want. This solution uses two twined axes object to get different scaling to the left and right of the origin, and then hides all the evidence:
import matplotlib.pyplot as plt
import matplotlib as mpl
from numbers import Number
tickkwargs = {m+k:False for k in ('bottom','top','left','right') for m in ('','label')}
p = np.zeros((10, 3, 2))
p[:,0,0] -= np.arange(10)*.1 + .5
p[:,1,1] += np.repeat(np.arange(5), 2)*.1 + .3
p[:,2,0] += np.arange(10)*.5 + 2
fig = plt.figure(figsize=(8,6))
host = fig.add_subplot(111)
par = host.twiny()
host.set_xlim(-6, 6)
par.set_xlim(-1, 1)
for ps in p:
# mask the points with negative x values
ppos = ps[ps[:,0] >= 0].T
host.plot(*ppos)
# mask the points with positive x values
pneg = ps[ps[:,0] <= 0].T
par.plot(*pneg)
# hide all possible ticks/notation text that could be set by the second x axis
par.tick_params(axis="both", **tickkwargs)
par.xaxis.get_offset_text().set_visible(False)
# fix the x tick labels so they're all positive
host.set_xticklabels(np.abs(host.get_xticks()))
fig.show()
Output:
Here's what the set of points p I used in the code above look like when plotted normally:
fig = plt.figure(figsize=(8,6))
ax = fig.gca()
for ps in p:
ax.plot(*ps.T)
fig.show()
Output:
The method of deriving a class of mscale.ScaleBase as shown in other answers may be too complicated for your purpose.
You can pass two scale transform functions to set_xscale or set_yscale, something like the following.
def get_scale(a=1): # a is the scale of your negative axis
def forward(x):
x = (x >= 0) * x + (x < 0) * x * a
return x
def inverse(x):
x = (x >= 0) * x + (x < 0) * x / a
return x
return forward, inverse
fig, ax = plt.subplots()
forward, inverse = get_scale(a=3)
ax.set_xscale('function', functions=(forward, inverse)) # this is for setting x axis
# do plotting
More examples can be found in this doc.

getting matplotlib radar plot with pandas

I am trying to go a step further by creating a radar plot like this question states. I using the same source code that the previous question was using, except I'm trying to implement this using pandas dataframe and pivot tables.
import numpy as np
import pandas as pd
from StringIO import StringIO
import matplotlib.pyplot as plt
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
def radar_factory(num_vars, frame='circle'):
"""Create a radar chart with `num_vars` axes."""
# calculate evenly-spaced axis angles
theta = 2 * np.pi * np.linspace(0, 1 - 1. / num_vars, num_vars)
# rotate theta such that the first axis is at the top
theta += np.pi / 2
def draw_poly_frame(self, x0, y0, r):
# TODO: use transforms to convert (x, y) to (r, theta)
verts = [(r * np.cos(t) + x0, r * np.sin(t) + y0) for t in theta]
return plt.Polygon(verts, closed=True, edgecolor='k')
def draw_circle_frame(self, x0, y0, r):
return plt.Circle((x0, y0), r)
frame_dict = {'polygon': draw_poly_frame, 'circle': draw_circle_frame}
if frame not in frame_dict:
raise ValueError, 'unknown value for `frame`: %s' % frame
class RadarAxes(PolarAxes):
"""Class for creating a radar chart (a.k.a. a spider or star chart)
http://en.wikipedia.org/wiki/Radar_chart
"""
name = 'radar'
# use 1 line segment to connect specified points
RESOLUTION = 1
# define draw_frame method
draw_frame = frame_dict[frame]
def fill(self, *args, **kwargs):
"""Override fill so that line is closed by default"""
closed = kwargs.pop('closed', True)
return super(RadarAxes, self).fill(closed=closed, *args, **kwargs)
def plot(self, *args, **kwargs):
"""Override plot so that line is closed by default"""
lines = super(RadarAxes, self).plot(*args, **kwargs)
for line in lines:
self._close_line(line)
def _close_line(self, line):
x, y = line.get_data()
# FIXME: markers at x[0], y[0] get doubled-up
if x[0] != x[-1]:
x = np.concatenate((x, [x[0]]))
y = np.concatenate((y, [y[0]]))
line.set_data(x, y)
def set_varlabels(self, labels):
self.set_thetagrids(theta * 180 / np.pi, labels)
def _gen_axes_patch(self):
x0, y0 = (0.5, 0.5)
r = 0.5
return self.draw_frame(x0, y0, r)
register_projection(RadarAxes)
return theta
def day_radar_plot(df):
fig = plt.figure(figsize=(6,6))
#adjust spacing around the subplots
fig.subplots_adjust(wspace=0.25,hspace=0.20,top=0.85,bottom=0.05)
ldo,rup = 0.1,0.8 #leftdown and right up normalized
ax = fig.add_axes([ldo,ldo,rup,rup],polar=True)
N = len(df['Group1'].unique())
theta = radar_factory(N)
polar_df = pd.DataFrame(df.groupby([df['Group1'],df['Type'],df['Vote']]).size())
polar_df.columns = ['Count']
radii = polar_df['Count'].get_values()
names = polar_df.index.get_values()
#get the number of unique colors needed
num_colors_needed = len(names)
#Create the list of unique colors needed for red and blue shades
Rcolors = []
Gcolors = []
for i in range(num_colors_needed):
ri=1-(float(i)/float(num_colors_needed))
gi=0.
bi=0.
Rcolors.append((ri,gi,bi))
for i in range(num_colors_needed):
ri=0.
gi=1-(float(i)/float(num_colors_needed))
bi=0.
Gcolors.append((ri,gi,bi))
from_x = np.linspace(0,0.95,num_colors_needed)
to_x = from_x + 0.05
i = 0
for d,f,R,G in zip(radii,polar_df.index,Rcolors,Gcolors):
i = i+1
if f[2].lower() == 'no':
ax.plot(theta,d,color=R)
ax.fill(theta,d,facecolor=R,alpha=0.25)
#this is where I think i have the issue
ax.axvspan(from_x[i],to_x[i],color=R)
elif f[2].lower() == 'yes':
ax.plot(theta,d,color=G)
ax.fill(theta,d,facecolor=G,alpha=0.25)
#this is where I think i have the issue
ax.axvspan(from_x[i],to_x[i],color=G)
plt.show()
So, let's say I have this StringIO that has a list of Group1 voting either yes or no and they are from a numbered type..these numbers are arbitrary in labeling but just as an example..
fakefile = StringIO("""\
Group1,Type,Vote
James,7,YES\nRachael,7,YES\nChris,2,YES\nRachael,9,NO
Chris,2,YES\nChris,7,NO\nRachael,9,NO\nJames,2,NO
James,7,NO\nJames,9,YES\nRachael,9,NO
Chris,2,YES\nChris,2,YES\nRachael,7,NO
Rachael,7,YES\nJames,9,YES\nJames,9,NO
Rachael,2,NO\nChris,2,YES\nRachael,7,YES
Rachael,9,NO\nChris,9,NO\nJames,7,NO
James,2,YES\nChris,2,NO\nRachael,9,YES
Rachael,9,YES\nRachael,2,NO\nChris,7,YES
James,7,YES\nChris,9,NO\nRachael,9,NO\n
Chris,9,YES
""")
record = pd.read_csv(fakefile, header=0)
day_radar_plot(record)
The error I get is Value Error: x and y must have same first dimension.
As I indicated in my script, I thought I had a solution for it but apparently I'm going by it the wrong way. Does anyone have any advice or guidance?
Since I'm completely lost in what you are trying to do, I will simply provide a solution on how to draw a radar chart from the given data.
It will answer the question how often have people voted Yes or No.
import pandas as pd
import numpy as np
from StringIO import StringIO
import matplotlib.pyplot as plt
fakefile = StringIO("""\
Group1,Type,Vote
James,7,YES\nRachael,7,YES\nChris,2,YES\nRachael,9,NO
Chris,2,YES\nChris,7,NO\nRachael,9,NO\nJames,2,NO
James,7,NO\nJames,9,YES\nRachael,9,NO
Chris,2,YES\nChris,2,YES\nRachael,7,NO
Rachael,7,YES\nJames,9,YES\nJames,9,NO
Rachael,2,NO\nChris,2,YES\nRachael,7,YES
Rachael,9,NO\nChris,9,NO\nJames,7,NO
James,2,YES\nChris,2,NO\nRachael,9,YES
Rachael,9,YES\nRachael,2,NO\nChris,7,YES
James,7,YES\nChris,9,NO\nRachael,9,NO\n
Chris,9,YES""")
df = pd.read_csv(fakefile, header=0)
df["cnt"] = np.ones(len(df))
pt = pd.pivot_table(df, values='cnt', index=['Group1'],
columns=['Vote'], aggfunc=np.sum)
fig = plt.figure()
ax = fig.add_subplot(111, projection="polar")
theta = np.arange(len(pt))/float(len(pt))*2.*np.pi
l1, = ax.plot(theta, pt["YES"], color="C2", marker="o", label="YES")
l2, = ax.plot(theta, pt["NO"], color="C3", marker="o", label="NO")
def _closeline(line):
x, y = line.get_data()
x = np.concatenate((x, [x[0]]))
y = np.concatenate((y, [y[0]]))
line.set_data(x, y)
[_closeline(l) for l in [l1,l2]]
ax.set_xticks(theta)
ax.set_xticklabels(pt.index)
plt.legend()
plt.title("How often have people votes Yes or No?")
plt.show()

Get the y value of a given x

I have a simple question but have not found any answer..
Let's have a look at this code :
from matplotlib import pyplot
import numpy
x=[0,1,2,3,4]
y=[5,3,40,20,1]
pyplot.plot(x,y)
It is plotted and all the points ared linked.
Let's say I want to get the y value of x=1,3.
How can I get the x values matching with y=30 ? (there are two)
Many thanks for your help
You could use shapely to find the intersections:
import matplotlib.pyplot as plt
import numpy as np
import shapely.geometry as SG
x=[0,1,2,3,4]
y=[5,3,40,20,1]
line = SG.LineString(list(zip(x,y)))
y0 = 30
yline = SG.LineString([(min(x), y0), (max(x), y0)])
coords = np.array(line.intersection(yline))
print(coords[:, 0])
fig, ax = plt.subplots()
ax.axhline(y=y0, color='k', linestyle='--')
ax.plot(x, y, 'b-')
ax.scatter(coords[:, 0], coords[:, 1], s=50, c='red')
plt.show()
finds solutions for x at:
[ 1.72972973 2.5 ]
The following code might do what you want. The interpolation of y(x) is straight forward, as the x-values are monotonically increasing. The problem of finding the x-values for a given y is not so easy anymore, once the function is not monotonically increasing as in this case. So you still need to know roughly where to expect the values to be.
import numpy as np
import scipy.interpolate
import scipy.optimize
x=np.array([0,1,2,3,4])
y=np.array([5,3,40,20,1])
#if the independent variable is monotonically increasing
print np.interp(1.3, x, y)
# if not, as in the case of finding x(y) here,
# we need to find the zeros of an interpolating function
y0 = 30.
initial_guess = 1.5 #for the first zero,
#initial_guess = 3.0 # for the secon zero
f = scipy.interpolate.interp1d(x,y,kind="linear")
fmin = lambda x: np.abs(f(x)-y0)
s = scipy.optimize.fmin(fmin, initial_guess, disp=False)
print s
I use python 3.
print(numpy.interp(1.3, x, y))
Y = 30
eps = 1e-6
j = 0
for i, ((x0, x1), (y0, y1)) in enumerate(zip(zip(x[:-1], x[1:]), zip(y[:-1], y[1:]))):
dy = y1 - y0
if abs(dy) < eps:
if y0 == Y:
print('There are infinite number of solutions')
else:
t = (Y - y0)/dy
if 0 < t < 1:
sol = x0 + (x1 - x0)*t
print('solution #{}: {}'.format(j, sol))
j += 1