Pandas styling - change font size and format float/apply background gradient - pandas

I am building an application that displays stock correlations data in various visual forms, including a matrix with a heatmap applied. My heatmap is created by passing the correlation matrix dataframe into IPy Widgets Output, so I can display it as part of a VBox later on. I have successfully applied a background gradient and formatted my numbers to 2dp. Can anyone help me edit the function to also reduce the font size, I just want to shrink it up a little?
Note: I chose to do this using dataframe styling over matplotlib as I had a number of issues getting the output to display in the way I wanted. I also have a function that downloads the dataframe to excel with the styling applied.
I have tried putting the following line of code at the beginning of my notebook so I can leave it outside of the function, but it seems to get ignored once the dataframe is passed to Output.
pd.options.display.float_format = "{:,.2f}".format
Here is my code sample:
import seaborn as sns
import ipywidgets as ipw
import pandas as pd
import numpy as np
#Sample Data
data = np.random.randint(5,30,size=500)
df = pd.DataFrame(data.reshape((50,10)))
corr = df.corr()
#Function produces dataframe as Output
def output_heatmap_df(df):
out = ipw.Output()
with out:
display(df.style\
.background_gradient(cmap=sns.diverging_palette(220,10, as_cmap=True),axis=None).format("{:,.2f}"))
out.layout.width='1600px'
return out
output_heatmap_df(corr)

In case anyone should come across this, the below code worked for me in the end:
def output_heatmap_df(df):
out = ipw.Output()
with out:
display(df.style\
.background_gradient(cmap=sns.diverging_palette(220,10, as_cmap=True),axis=None).format("{:,.2f}")
.set_properties(**{'text-align':'center','font-size':'10px'})
.set_table_styles([{'selector':'th','props':[('text-align','center'),('font-size','10px')]}])
)
out.layout.width='1600px'
return out

Related

My bokeh could runs fine, but outputs a blank chart. What am I doing wrong?

I am trying to run some simple code that reads a CSV file, and runs the data to show an output in the form of a line graph. The query runs fine and gives me the below output, but for some reason it shows a very odd date format on the x-axis which leads to a very odd line with several outliers (not actually the case). Could someone help?
Date,Value
01/11/2020,4.5202
01/12/2020,4.6555
01/01/2021,4.7194
01/02/2021,4.7317
01/03/2021,4.6655
01/04/2021,4.4641
01/05/2021,4.3875
01/06/2021,4.3560
01/07/2021,4.3318
01/08/2021,4.3607
01/09/2021,4.4853
01/10/2021,4.6456
01/11/2021,5.2262
01/12/2021,5.3259
01/01/2022,5.3820
01/02/2022,5.3855
01/03/2022,5.2673
01/04/2022,4.9346
01/05/2022,4.7287
01/06/2022,4.6274
01/07/2022,4.6632
01/08/2022,4.6929
01/09/2022,4.7841
01/10/2022,4.9572
01/11/2022,5.4293
01/12/2022,5.5214
01/01/2023,5.5697
01/02/2023,5.5738
01/03/2023,5.4550
01/04/2023,5.1962
01/05/2023,4.9534
01/06/2023,4.8514
01/07/2023,4.8112
01/08/2023,4.8415
01/09/2023,4.9338
01/10/2023,5.1461
01/11/2023,5.6022
01/12/2023,5.6960
01/01/2024,5.7451
01/02/2024,5.7499
01/03/2024,5.6308
01/04/2024,5.2752
01/05/2024,5.0306
01/06/2024,4.9282
01/07/2024,4.8877
01/08/2024,4.9188
01/09/2024,5.0127
01/10/2024,5.2100
01/11/2024,5.6716
01/12/2024,5.7669
01/01/2025,5.8176
01/02/2025,5.8229
01/03/2025,5.7031
01/04/2025,5.2633
01/05/2025,5.0164
01/06/2025,4.9133
01/07/2025,4.8730
01/08/2025,4.9053
01/09/2025,5.0005
01/10/2025,5.3274
01/11/2025,5.6325
01/12/2025,5.7293
import pandas as pd
# Read in the CSV file: df
df = pd.read_csv('TTFcurve.csv', parse_dates=['Date'])
# Import figure from bokeh.plotting
from bokeh.plotting import figure, output_file, show
output_file("lines.html")
# Create the figure: p
#x = df.Date
#y = df.Value
p = figure(x_axis_label='Date', y_axis_label='Value')
# Plot mpg vs hp by color
p.line(df['Date'], df['Value'], line_color="red")
# Specify the name of the output file and show the result
show(p)
You have to tell Bokeh that your X axis is datetime:
p = figure(..., x_axis_type='datetime')
Regarding the outliers - check the data. I'm almost certain that Bokeh cannot "invent" any new points here. If you make sure that your data is absolutely fine, please post it so the above plot could be reproduced and checked.

Holoviz panel will not print pandas dataframe row in Jupyter notebook

I'm trying to recreate the first panel.interact example in the Holoviz tutorial using a Pandas dataframe instead of a Dask dataframe. I get the slider, but the pandas dataframe row does not show.
See the original example at: http://holoviz.org/tutorial/Building_Panels.html
I've tried using Dask as in the Holoviz example. Dask rows print out just fine, but it demonstrates that panel seem to treat Dask dataframe rows differently for printing than Pandas dataframe rows. Here's my minimal code:
import pandas as pd
import panel
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
def select_row(rowno=0):
row = df.loc[rowno]
return row
panel.extension()
panel.extension('katex')
panel.interact(select_row, rowno=(0, 5))
I've included a line with the katex extension, because without it, I get a warning that it is needed. Without it, I don't even get the slider.
I can call the select_row(rowno=0) function separately in a Jupyter cell and get a nice printout of the row, so it appears the function is working as it should.
Any help in getting this to work would be most appreciated. Thanks.
Got a solution. With Pandas, loc[rowno:rowno] returns a pandas.core.frame.DataFrame object of length 1 which works fine with panel while loc[rowno] returns a pandas.core.series.Series object which does not work so well. Thus modifying the select_row() function like this makes it all work:
def select_row(rowno=0):
row = df.loc[rowno:rowno]
return row
Still not sure, however, why panel will print out the Dataframe object and not the Series object.
Note: if you use iloc, then you use add +1, i.e., df.iloc[rowno:rowno+1].

Using multiple sliders to manipulate curves in a single graph

I created the following Jupyter Notebook. Here three functions are shifted using three sliders. In the future I would like to generalise it to an arbitrary number of curves (i.e. n-curves). However, right now, the graph updating procedure is very slow and the graph itself doesn't seem to be fixed in the corrispective cell . I didn't receive any error message but I'm pretty sure that there is a mistake in the update function.
Here is the the code
from ipywidgets import interact
import ipywidgets as widgets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
x = np.linspace(0, 2*np.pi, 2000)
y1=np.exp(0.3*x)*np.sin(5*x)
y2=5*np.exp(-x**2)*np.sin(20*x)
y3=np.sin(2*x)
m=[y1,y2,y3]
num_curve=3
def shift(v_X):
v_T=v_X
vector=np.transpose(m)
print(' ')
print(v_T)
print(' ')
curve=vector+v_T
return curve
controls=[]
o='vertical'
for i in range(num_curve):
title="x%i" % (i%num_curve+1)
sl=widgets.FloatSlider(description=title,min=-2.0, max=2.0, step=0.1,orientation=o)
controls.append(sl)
Dict = {}
for c in controls:
Dict[c.description] = c
uif = widgets.HBox(tuple(controls))
def update_N(**xvalor):
xvalor=[]
for i in range(num_curve):
xvalor.append(controls[i].value)
curve=shift(xvalor)
new_curve=pd.DataFrame(curve)
new_curve.plot()
plt.show()
outf = widgets.interactive_output(update_N,Dict)
display(uif, outf)
Your function is running on every single value the slider moves through, which is probably giving you the long times to run you are seeing. You can change this by adding continuous_update=False into your FloatSlider call (line 32).
sl=widgets.FloatSlider(description=title,
min=-2.0,
max=2.0,
step=0.1,
orientation=o,
continuous_update=False)
This got me much better performance, and the chart doesn't flicker as much as there are vastly fewer redraws. Does this help?

Change y-axis scaling fontsize in pandas dataframe.plot()

I am changing the font-sizes in my python pandas dataframe plot. The only part that I could not change is the scaling of y-axis values (see the figure below).
Could you please help me with that?
Added:
Here is the simplest code to reproduce my problem:
import pandas as pd
start = 10**12
finish = 1.1*10**12
y = np.linspace(start , finish)
pd.DataFrame(y).plot()
plt.tick_params(axis='x', labelsize=17)
plt.tick_params(axis='y', labelsize=17)
You will see that this result in the graph similar to above. No change in the scaling of the y-axis.
Ma
There are just so many features that you can control with the plotting capabilities of pandas, which leverages matplotlib. I found that seaborn is a lot easier to produce pretty charts, and you have a lot more control over the parameters of your plots.
This is not the most elegant solution, but it works; however, it has a seborn dependency:
%pylab inline
import pandas as pd
import seaborn as sns
import numpy as np
sns.set(style="darkgrid")
sns.set(font_scale=1.5)
start = 10**12
finish = 1.1*10**12
y = np.linspace(start , finish)
pd.DataFrame(y).plot()
plt.tick_params(axis='x', labelsize=17)
plt.tick_params(axis='y', labelsize=17)
I use Jupyter Notebook an that's why I use %pylab inline. The key element here is the use of
font_scale=1.5
Which you can set to whatver you want that produces your desired result. This is what I get:

How to Render Math Table Properly in IPython Notebook

The math problem that I'm solving gives different analytical solutions in different scenarios, and I would like to summarize the result in a nice table. IPython Notebook renders the list nicely:
for example:
import sympy
from pandas import DataFrame
from sympy import *
init_printing()
a, b, c, d = symbols('a b c d')
t = [[a/b, b/a], [c/d, d/c]]
t
However, when I summarize the answers into a table using DataFrame, the math cannot be rendered any more:
df = DataFrame(t, index=['Situation 1', 'Situation 2'], columns=['Answer1','Answer2'])
df
"print df.to_latex()" also gives the same result. I also tried "print(latex(t))" but it gives this after compiling in LaTex, which is alright, but I still need to manually convert it to a table:
How should I use DataFrame properly in order to render the math properly? Or is there any other way to export the math result into a table in Latex? Thanks!
Update: 01/25/14
Thanks again to #Jakob for solving the problem. It works perfectly for simple matrices, though there are still some minor problems for more complicated math expressions. But I guess like #asmeurer said, perfection requires an update in IPython and Pandas.
Update: 01/26/14
If I render the result directly, i.e. just print the list, it works fine:
MathJax is currently not able to render tables, hence the most obvious approach (pure latex) does not work.
However, following the advise of #asmeurer you should use an html table and render the cell content as latex. In your case this could be easily achieved by the following intermediate step:
from sympy import latex
tl = map(lambda tc: '$'+latex(tc)+'$',t)
df = DataFrame(tl, index=['Situation 1', 'Situation 2'], columns=['Answer'])
df
which gives:
Update:
In case of two dimensional data, the simple map function will not work directly. To cope with this situation the numpy shape, reshape and ravel functions could be used like:
import numpy as np
t = [[a/b, b/a],[a*a,b*b]]
tl=np.reshape(map(lambda tc: '$'+latex(tc)+'$',np.ravel(t)),np.shape(t))
df = DataFrame(tl, index=['Situation 1', 'Situation 2'], columns=['Answer 1','Answer 2'])
df
This gives:
Update 2:
Pandas crops cell content if the string length exceeds a certain number. E.g a more complicated expression like
t1 = [a/2+b/2+c/2+d/2]
tl=np.reshape(map(lambda tc: '$'+latex(tc)+'$',np.ravel(t1)),np.shape(t1))
df = DataFrame(tl, index=['Situation 1'], columns=['Answer 1'])
df
gives:
To cope with this issue a pandas package option has to be altered, for details see here. For the present case the max_colwidth has to be changed. The default value is 50, hence let's change it to 100:
import pandas as pd
pd.options.display.max_colwidth=100
df
gives: