Display DataFrame with Math Symbol in column headers in Jupyter Notebook - dataframe

I have a DataFrame which I would like to display with the greek names of the aggregate functions.
df=pd.DataFrame(
[["A",1,2],["A",3,4],["B",5,6],["B",7,8]],
columns=["AB","C", "N"]
)
df=df.groupby(df.AB).agg({
"C":["count", "sum", "mean", "std"],
"N":["sum", "mean", "std"]
})
Which looks like:
I would like to produce something that looks like this:
I have been able to produce:
With
import pandas as pd
import matplotlib.pyplot as plt
data = [[str(cell) for cell in row] for row in df.values]
columns = [
r"Count",
r"C $\Sigma$",
r"C $\bar{x}$",
r"C $\sigma$",
r"N $\Sigma$",
r"N $\bar{x}$",
r"N $\sigma$"]
rows = ["A", "B"]
the_table = plt.table(cellText=data,
rowLabels=rows,
colLabels=columns)
the_table.scale(4,4)
the_table.set_fontsize(24)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
plt.tick_params(axis='y', which='both', right=False, left=False, labelleft=False)
for pos in ['right','top','bottom','left']:
plt.gca().spines[pos].set_visible(False)
The df.to_latex() feature looks like it could probably do enough for my purposes but it renders as a tabular which is not supported by jupyter.
Thanks to Elliot below, something like this works quite nicely
substitutions = {
"sum":"\u03a3",
"mean":"\u03bc",
"std":"\u03c3",
True:"\u2705",
False:"\u274c",
"count":"N",
}
pretty = df.\
rename(substitutions, axis=0).\
rename(substitutions, axis=1)
and with:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
border: 1px black solid !important;
color: black !important;
}
th {
text-align: center !important;
}
</style>
Can produce

You could use Unicode characters to get the character headings you'd like, without bothering with to_latex().
If you want the borders you could use to_html with custom options to format the table.

Related

Format DataFrame to better show borders where the first index column changes [duplicate]

I'm using pandas style in a jupyter notebook to emphasize the borders between subgroups in this dataframe:
(technically speaking: to draw borders at every changed multiindex but disregarding the lowest level)
# some sample df with multiindex
res = np.repeat(["response1","response2","response3","response4"], 4)
mod = ["model1", "model2","model3","model4"]*len(res)
data = np.random.randint(0,50,size=len(mod))
df = pd.DataFrame(zip(res,mod,data), columns=["res","mod","data"])
df.set_index(["res","mod"], inplace=True)
# set borders at individual frequency
indices_with_borders = range(0,len(df), len(np.unique(mod)))
df.style.set_properties(subset=(df.index[indices_with_borders], df.columns), **{
'border-width': '1px', "border-top-style":"solid"})
Result:
Now it looks a bit silly, that the borders are only drawn across the columns but not continue all the way through the multiindex. This would be a more pleasing style:
Does anybody know how / if it can be achieved?
Thanks in advance!
s = df.style
for l0 in ['response1', 'response2', 'response3', 'response4']:
s.set_table_styles({(l0, 'model4'): [{'selector': '', 'props': 'border-bottom: 3px solid red;'}],
(l0, 'model1'): [{'selector': '.level0', 'props': 'border-bottom: 3px solid green'}]},
overwrite=False, axis=1)
s
Because a multiindex sparsifies and spans rows you need to control the row classes with a little care. This is a bit painful but it does what you need...
s = df.style
for idx, group_df in df.groupby('res'):
s.set_table_styles({group_df.index[0]: [{'selector': '', 'props': 'border-top: 3px solid green;'}]},
overwrite=False, axis=1)
s
I took Attack68's answer and thought I would show how to make it more generic which can be useful if you have more levels in the multiindex. Allows you to groupby any level in the multiindex and adds a border at the top of that level. So if we wanted to do the same for the level mod we could also do:
df = df.sort_index(level=['mod'])
s = df.style
for idx, group_df in df.groupby('mod'):
s.set_table_styles({group_df.index[0]: [{'selector': '', 'props': 'border-top: 3px solid green;'}]},
overwrite=False, axis=1)
s
I found this to be the easiest solution to automatically add all lines for an arbitrarily deep multi-index:
df.sort_index(inplace=True)
s = df.style
for i, _ in df.iterrows():
s.set_table_styles({i: [{'selector': '', 'props': 'border-top: 3px solid black;'}]}, overwrite=False, axis=1)
s

How to make a graded colourmap in Matplotlib with specified intervals

I am trying to make a custom colourmap in matplotlib but can't quite get it right. What I want is a specific colour for specific values, but for there to be a smooth gradation between them. So something along the lines of this:
colourA = {
0.01: "#51158d"
0.25: "#5c2489"
0.5: "#693584"
1: "#9d9933"
3: "#e3dc56"
}
I've seen the method LinearSegmentedColormap.from_list() which I can pass all the hex codes into, but how can I assign specific colours to specific values so that the steps are not equally distributed?
You don't have to use the from_list classmethod, that just exists for convenience, assuming a commonly used equal spacing between them.
You can convert your colors to the cdict required for the LinearSegmentedColormap. Starting with what you already have:
import matplotlib as mpl
colors = {
0.01: "#51158d",
0.25: "#5c2489",
0.5: "#693584",
1: "#9d9933",
3: "#e3dc56",
}
my_cmap = mpl.colors.LinearSegmentedColormap.from_list("my_cmap", list(colors.values()))
You basically need to do two things; normalize your range between 0 and 1, and convert the hex format to rgb.
For example:
from collections import defaultdict
cdict = defaultdict(list)
vmin = min(colors.keys())
vmax = max(colors.keys())
for k,v in colors.items():
k_norm = (k - vmin) / (vmax - vmin)
r,g,b = mpl.colors.to_rgb(v)
cdict["red"].append((k_norm, r, r))
cdict["green"].append((k_norm, g, g))
cdict["blue"].append((k_norm, b, b))
my_cmap = mpl.colors.LinearSegmentedColormap("my_cmap", cdict)
Note that the cdict requires the x-values to increase monotonically, which is the case with this example. Otherwise first sort the colors in the dictionary.
When using this to plot, you can/should use a norm to scale it between whatever values you want, for example:
norm = mpl.colors.Normalize(vmin=0.01, vmax=3)
Details about the format of the cdict can be found in the docstring:
https://matplotlib.org/stable/api/_as_gen/matplotlib.colors.LinearSegmentedColormap.html

How to place a widget over a panel HoloViews dynamic map

I am trying to display the widgets of a HoloViews dynamic plot (Select, Slider, etc.) over the plot. All I can find is the widget_location argument which takes the location relative to the plot (‘left’ , ‘right’, …). But I want it to be placed over the plot, not next to it. I was wondering if there is a way for doing this?
P.S. for instance there is opts(colorbar_opts={'location':(float,float)}) which can be used to place the colorbar where you want. It would be very useful to have a similar option for widgets.
OK, I found the solution! I have to use custom CSS. The code below shows how to do it.
import holoviews as hv
import panel as pn
pn.extension('ace')
hv.extension("bokeh")
plots = {}
for i in range(5,10):
data = {
"x": list(range(0,i)), "y": [i]*i
}
plots[i]=hv.Curve(data).opts(width=500)
hvmap = hv.HoloMap(plots)
left_pos = pn.widgets.IntSlider(value=5, step=1, start=-1000, end=5, name="x")
top_pos = pn.widgets.IntSlider(value=5, step=1, start=5, end=200, name="y")
style = pn.pane.HTML(height=0, width=0, sizing_mode="fixed", margin=0)
css = pn.widgets.Ace(height=150)
#pn.depends(left_pos=left_pos, top_pos=top_pos, watch=True)
def _update_css(left_pos, top_pos):
value = f"""
.bk.panel-widget-box {{
left: {left_pos}px !important;
top: {top_pos}px !important;
}}
"""
css.value = value
style.object = "<style>" + value + "</style>"
pn.Column("""# How to overlay widgets on HoloViews Map?
We will be using css to overlay and Panel to create the this tool""",
hvmap,
"## Settings",
left_pos,
top_pos,
css,
style,
).servable()
All credits goes to Marc Skov Madsen. Original answer here

Grouping and heading pandas dataframe

I have the following dataframe of securities and computed a 'liquidity score' in the last column, where 1 = liquid, 2 = less liquid, and 3 = illiquid. I want to group the securities (dynamically) by their liquidity. Is there a way to group them and include some kind of header for each group? How can this be best achieved. Below is the code and some example, how it is supposed to look like.
import pandas as pd
df = pd.DataFrame({'ID':['XS123', 'US3312', 'DE405'], 'Currency':['EUR', 'EUR', 'USD'], 'Liquidity score':[2,3,1]})
df = df.sort_values(by=["Liquidity score"])
print(df)
# 1 = liquid, 2 = less liquid,, 3 = illiquid
Add labels for liquidity score
The following replaces labels for numbers in Liquidity score:
df['grp'] = df['Liquidity score'].replace({1:'Liquid', 2:'Less liquid', 3:'Illiquid'})
Headers for each group
As per your comment, find below a solution to do this.
Let's illustrate this with a small data example.
df = pd.DataFrame({'ID':['XS223', 'US934', 'US905', 'XS224', 'XS223'], 'Currency':['EUR', 'USD', 'USD','EUR','EUR',]})
Insert a header on specific rows using np.insert.
df = pd.DataFrame(np.insert(df.values, 0, values=["Liquid", ""], axis=0))
df = pd.DataFrame(np.insert(df.values, 2, values=["Less liquid", ""], axis=0))
df.columns = ['ID', 'Currency']
Using Pandas styler, we can add a background color, change font weight to bold and align the text to the left.
df.style.hide_index().set_properties(subset = pd.IndexSlice[[0,2], :], **{'font-weight' : 'bold', 'background-color' : 'lightblue', 'text-align': 'left'})
You can add a new column like this:
df['group'] = np.select(
[
df['Liquidity score'].eq(1),
df['Liquidity score'].eq(2)
],
[
'Liquid','Less liquid'
],
default='Illiquid'
)
And try setting as index, so you can filter using the index:
df.set_index(['grouping','ID'], inplace=True)
df.loc['Less liquid',:]

journal quality kde plots with seaborn/pandas

I'm trying to do some comparative analysis for a publication. I came across seaborn and pandas and really like the ease with which I can create the analysis that I want. However, I find the manuals a bit scanty on the things that I'm trying to understand about the example plots and how to modify the plots to my needs. I'm hoping for some advice here on to get the plots I'm want. Perhaps pandas/seaborn is not what I need.
So, I would like to create subplots, (3,1) or (2,3), of the following figure:
Questions:
I would like the attached plot to have a title on the colorbar. Not sure if this is possible or exactly what is shown, i.e., is it relative frequency or occurrence or a percentage, etc? How can I put a explanatory tile on the colorbar (oriented vertically).
The text is a nice addition. The pearsonr is the correlation, but I'm not sure what is p. My guess is that it is showing the lag, or? If so, how can I remove the p in the text?
I would like to make the same kind of figure for different variables and put it all in a subplot.
Here's the code I pieced together from the seaborn manual/examples and from other users here on SO (thanks guys).
import netCDF4 as nc
import pandas as pd
import xarray as xr
import numpy as np
import seaborn as sns
import pdb
import matplotlib.pyplot as plt
from scipy import stats, integrate
import matplotlib as mpl
import matplotlib.ticker as tkr
import matplotlib.gridspec as gridspec
sns.set(style="white")
sns.set(color_codes=True)
octp = [622.0, 640.0, 616.0, 731.0, 668.0, 631.0, 641.0, 589.0, 801.0,
828.0, 598.0, 742.0,665.0, 611.0, 773.0, 608.0, 734.0, 725.0, 716.0,
699.0, 686.0, 671.0, 700.0, 656.0,686.0, 675.0, 678.0, 653.0, 659.0,
682.0, 674.0, 684.0, 679.0, 704.0, 624.0, 727.0,739.0, 662.0, 801.0,
633.0, 896.0, 729.0, 659.0, 741.0, 510.0, 836.0, 720.0, 685.0,430.0,
833.0, 710.0, 799.0, 534.0, 532.0, 605.0, 519.0, 850.0, 357.0, 858.0,
497.0,404.0, 456.0, 448.0, 836.0, 462.0, 381.0, 499.0, 673.0, 642.0,
641.0, 458.0, 809.0,562.0, 742.0, 732.0, 710.0, 658.0, 533.0, 811.0,
853.0, 856.0, 785.0, 659.0, 697.0,654.0, 673.0, 707.0, 711.0, 423.0,
751.0, 761.0, 638.0, 576.0, 538.0, 596.0, 718.0,843.0, 640.0, 647.0,
692.0, 599.0, 607.0, 537.0, 679.0, 712.0, 612.0, 641.0, 665.0,658.0,
722.0, 656.0, 656.0, 742.0, 505.0, 688.0, 805.0]
cctp = [482.0, 462.0, 425.0, 506.0, 500.0, 464.0, 486.0, 473.0, 577.0,
735.0, 390.0, 590.0,464.0, 417.0, 722.0, 410.0, 679.0, 680.0, 711.0,
658.0, 687.0, 621.0, 643.0, 690.0,630.0, 661.0, 608.0, 658.0, 624.0,
646.0, 651.0, 634.0, 612.0, 636.0, 607.0, 539.0,706.0, 614.0, 706.0,
401.0, 720.0, 746.0, 511.0, 700.0, 453.0, 677.0, 637.0, 605.0,454.0,
733.0, 535.0, 725.0, 668.0, 513.0, 470.0, 589.0, 765.0, 596.0, 749.0,
462.0,469.0, 514.0, 511.0, 789.0, 647.0, 324.0, 555.0, 670.0, 656.0,
786.0, 374.0, 757.0,645.0, 744.0, 708.0, 497.0, 654.0, 288.0, 705.0,
703.0, 446.0, 675.0, 440.0, 652.0,589.0, 542.0, 661.0, 631.0, 343.0,
585.0, 632.0, 591.0, 602.0, 365.0, 535.0, 663.0,561.0, 448.0, 582.0,
591.0, 535.0, 475.0, 422.0, 599.0, 594.0, 569.0, 576.0, 622.0,483.0,
539.0, 515.0, 621.0, 443.0, 435.0, 502.0, 443.0]
cctp = pd.Series(cctp, name='CTP [hPa]')
octp = pd.Series(octp, name='CTP [hPa]')
formatter = tkr.ScalarFormatter(useMathText=True)
formatter.set_scientific(True)
formatter.set_powerlimits((-2, 2))
g = sns.jointplot(cctp,octp, kind="kde",size=8,space=0.2,cbar=True,
n_levels=50,cbar_kws={"format": formatter})
# add a line x=y
x0, x1 = g.ax_joint.get_xlim()
y0, y1 = g.ax_joint.get_ylim()
lims = [max(x0, y0), min(x1, y1)]
g.ax_joint.plot(lims, lims, ':k')
plt.show()
plt.savefig('test_fig.png')
I know I'm asking a lot here. So I put the questions in order of priority.
1: To set the colorbar label, you can add the label key to the cbar_kws dict:
cbar_kws={"format": formatter, "label": 'My colorbar'}
2: To change the stats label, you need to first slightly modify the stats.pearsonr function to only return the first value, instead of the (pearsonr, p) tuple:
pr = lambda a, b: stats.pearsonr(a, b)[0]
Then, you can change that function using jointplot's stat_func kwarg:
stat_func=pr
and finally, you need to change the annotation to get the label right:
annot_kws={'stat':'pearsonr'})
Putting that all together:
pr = lambda a, b: stats.pearsonr(a, b)[0]
g = sns.jointplot(cctp,octp, kind="kde",size=8,space=0.2,cbar=True,
n_levels=50,cbar_kws={"format": formatter, "label": 'My colorbar'},
stat_func=pr, annot_kws={'stat':'pearsonr'})
3: I don't think its possible to put everything in a subplot with jointplot. Happy to be proven wrong there though.