How do you add dataclasses as valid index values to a plotly chart? - plotly-python

I am trying to switch from the matplotlib pandas plotting backend to plotly. However, I am being held back by a common occurrence of this error:
TypeError: Object of type Quarter is not JSON serializable
Where Quarter is a dataclass in my codebase.
For a minimal example, consider:
#dataclass
class Foo:
val:int
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df.plot.scatter(x='x', y='y')
As expected, the above returns:
TypeError: Object of type Foo is not JSON serializable
Now, I don't expect plotly to be magical, but adding a __float__ magic method allows the Foo objects to be used with the matplotlib backend:
# This works
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df.plot.scatter(x='x', y='y')
How can I update my dataclass to allow for it to be used with the plotly backend?

You can get pandas to cast to float before invoking plotting backend.
from dataclasses import dataclass
import pandas as pd
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df["x"].astype(float)
pd.options.plotting.backend = "plotly"
df.assign(x=lambda d: d["x"].astype(float)).plot.scatter(x='x', y='y')
monkey patching
if you don't want to change code, you can monkey patch the plotly implementation of pandas plotting API
https://pandas.pydata.org/pandas-docs/stable/development/extending.html#plotting-backends
from dataclasses import dataclass
import pandas as pd
import wrapt, json
import plotly
#wrapt.patch_function_wrapper(plotly, 'plot')
def new_plot(wrapped, instance, args, kwargs):
try:
json.dumps(args[0][kwargs["x"]])
except TypeError:
args[0][kwargs["x"]] = args[0][kwargs["x"]].astype(float)
return wrapped(*args, **kwargs)
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df["x"].astype(float)
pd.options.plotting.backend = "plotly"
df.plot.scatter(x='x', y='y')

Related

implement shortcut for pandas code by extending it

Pandas has a very nice feature to save a df to clipboard. This help a lot to take the df output and analyze/inspect further in excel.
df={'A':['x','y','x','z','y'],
'B':[1,2,2,2,2],
'C':['a','b','a','d','d']}
df=pd.DataFrame(df)
df.to_clipboard(excel=True,index=False)
However I don't want to type df.to_clipboard(excel=True,index=False) each time I need to copy the df to clipboard. Is there a way I can do something like df.clip()
I tried to implement it like this but it does not work.
class ExtDF(pd.DataFrame):
def __init__(self, *args, **kwargs):
super(ExtDF, self).__init__(*args, **kwargs)
#property
def _constructor(self):
return ExtDF
def clip(self):
return self.to_clipboard(excel=True,index=False)
I would monkey patch pandas.DataFrame rather than defining a subclass:
# define a function and monkey patch pandas.DataFrame
def clipxl(self):
return self.to_clipboard(excel=True, index=False)
pd.DataFrame.clip = clipxl
# now let's try it
df={'A': ['x','y','x','z','y'],
'B': [1,2,2,2,2],
'C': ['a','b','a','d','d']}
df=pd.DataFrame(df)
df.clip()
clipboard content:
A B C
x 1 a
y 2 b
x 2 a
z 2 d
y 2 d
using a module
expand_pandas.py
import pandas as pd
# define a function and monkey patch pandas.DataFrame
def clipxl(self):
return self.to_clipboard(excel=True,index=False)
def hello(self):
return 'Hello World!'
pd.DataFrame.clip = clipxl
pd.DataFrame.hello = hello
In your environment:
import pandas as pd
import expand_pandas
df = pd.DataFrame({'A': ['x','y','x','z','y'],
'B': [1,2,2,2,2],
'C': ['a','b','a','d','d']})
df.hello()
# 'Hello World!'
df.clip()
# sends to clipboard
Instead of the object oriented approach you could write a function that does it and pass the DataFrame as a parameter.
Alternatively, instead of modifying the DataFrame class itself, you can create a class that contains your dataframe and define a method that does what you want.

How to convert a Pydantic model in FastAPI to a Pandas DataFrame?

I am trying to convert a Pydantic model to a Pandas DataFrame, but I am getting various errors.
Here is the code:
from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import sklearn
import pandas as pd
import numpy as np
class Userdata(BaseModel):
current_res_month_dec: Optional[int] = 0
current_res_month_nov: Optional[int] = 0
async def return_recurrent_user_predictions_gb(user_data: Userdata):
empty_dataframe = pd.DataFrame([Userdata(**{
'current_res_month_dec': user_data.current_res_month_dec,
'current_res_month_nov': user_data.current_res_month_nov})], ignore_index=True)
This is the DataFrame that is returned when trying to execute it through /docs in my local environment:
Response body
Download
{
"0": {
"0": [
"current_res_month_dec",
0
]
},
"1": {
"0": [
"current_res_month_nov",
0
]
}
but if I try to use this DataFrame for a prediction:
model_has_afternoon = pickle.load(open('./models/model_gbclf_prob_current_product_has_afternoon.pickle', 'rb'))
result_afternoon = model_has_afternoon.predict_proba(empty_dataframe)[:, 1]
I get this error:
ValueError: setting an array element with a sequence.
I have tried building my own DataFrame before, and the predictions should work with a DataFrame.
You first need to convert the Pydantic model into a dictionary using Pydantic's dict() method. Note that other methods, such as Python's dict() function and .__dict__ attribute, have been found to be faster alternatives to Pydantic's dict() method (see this answer). However, since you are using a Pydantic model, it might be best to use Pydantic's dict() method, and then pass the dictionary to pandas.DataFrame() surrounded by square brackets; for example, pd.DataFrame([data.dict()]). As described in this answer, this approach can be used when you need the keys of the passed dict to be the columns and the values to be the rows. If you need to specify a different orientation, you can also use pandas.DataFrame.from_dict().
Working Example
from typing import Optional
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd
app = FastAPI()
class Userdata(BaseModel):
col1: Optional[int] = 0
col2: Optional[int] = 0
col3: str = "foo"
#app.post('/submit')
def submit_data(data: Userdata):
df = pd.DataFrame([data.dict()])
return "Success"
More Options
As you mentioned that you would like to use the DataFrame for Machine Learning predictions, it should be noted that there are a few other options to pass the data to predict() and predict_proba() functions that do not require to create a DataFrame. These options include:
model.predict([[data.col1, data.col2, data.col3]])
and
model.predict([list(data.dict().values())])
Please have a look at this answer for more details. In case you would also need to respond back to the client with a DataFrame in JSON format, please take a look here.

Accessing methods within a class from bokeh FileInput widget

I am working on a Bokeh serve UI and am running into trouble interfacing a class (and its methods) with the FileInput widget. I am using a class (in this example, called "EIS_data") which, when instantiated, loads a file using pd.read_csv. The EIS_data class also has a method to plot the data in a particular way, and I'd like to be able to load the pandas dataframe and call and manipulate the data using the methods already in place in the class.
So far, I have been able to load the data successfully using the FileInput widget, but I can't figure out how to access the dataframe again once it's loaded in. In a standalone Jupyter notebook, I could run d = EIS_data("filename") and then ```d.plot''' to load the data into a pandas dataframe and then plot it according to the method defined in the EIS_data class, but I can't figure out how to replicate this in the UI code once the data are loaded using the FileInput widget.
Is there a way I can interface this with Bokeh widgets, such that I could simply add d.plot() to curdoc()? I have found a workaround using ColumnDataSource, but it seems a shame to redefine plotting methods and data handling when they are already defined in the class. Below are minimal working examples of the UI code and the class definition.
UI Code:
import numpy as np
import pandas as pd
from eis_analysis_trimmed import EIS_data
import bokeh
from bokeh.io import curdoc
from bokeh import layouts
from bokeh.layouts import column,row,gridplot
from bokeh.plotting import figure
from bokeh.models import *
import base64
import io
## Instantiate the EIS_data class for loading data
def load_data(f):
return EIS_data(f)
## updater function called to load data with FileInput widget
## Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
print(dat.df)
print(dat)
print("EIS Data Uploaded Successfully")
return dat
f_load = Paragraph(text="""Load Data""",height=15)
f = FileInput()
f.on_change('value',load_file)
curdoc().add_root(column(f))
and here is the EIS_data class:
import numpy as np
from scipy.optimize import curve_fit
from sklearn.metrics import r2_score
from bokeh.plotting import figure, show
from bokeh.models import LinearAxis, Range1d
from bokeh.resources import INLINE
import bokeh.io
#locally include javascript dependencies in html
bokeh.io.output_notebook(INLINE)
class EIS_data:
def __init__(self, file_name, delimiter='\t',
header=0, f_low=None, f_high=None):
#load eis data into a pandas dataframe
eis_data = pd.read_csv(file_name, delimiter=delimiter, header=header)
#iterate through all of the columns and check to see
#if all of the values in that column are null
#if they are, then remove that column
for c in eis_data.columns:
if eis_data[c].isnull().all():
eis_data = eis_data.drop([c], axis=1)
#make sure that the data are imported as floats and not strings
eis_data = eis_data[['freq/Hz', 'Re(Z)/Ohm', '-Im(Z)/Ohm']]
eis_data['freq/Hz'] = pd.to_numeric(eis_data['freq/Hz'])
eis_data['Re(Z)/Ohm'] = pd.to_numeric(eis_data['Re(Z)/Ohm'])
eis_data['-Im(Z)/Ohm'] = pd.to_numeric(eis_data['-Im(Z)/Ohm'])
self.df = eis_data.sort_values(by='freq/Hz')
def plot(self, fit_vals = None):
plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
plot.circle(self.df['Re(Z)/Ohm'], self.df['-Im(Z)/Ohm'],
size=7, color='navy', name='Data')
return plot
EDIT: Adding the workaround using ColumnDataSource
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import *
from bokeh.models.widgets import FileInput
import base64
import io
from eis_analysis2 import EIS_data
# Instantiate the EIS_data class for loading data
def load_data(data):
return EIS_data(data)
# updater function called to load data with FileInput widget
# Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
dat_df = dat.df
# Replace plot data with data from newly-loaded file
source.data = dict(freq=dat_df[dat_df.columns[0]], reZ=dat_df[dat_df.columns[1]], imZ=dat_df[dat_df.columns[2]])
#phase,mag = bode_calc(reZ,imZ)
print(dat_df)
print("EIS Data Uploaded Successfully")
# Create Column Data Source that will be used by the plot
source = ColumnDataSource(data=dict(freq=[], reZ=[], imZ=[]))
##Make the nyquist plot
nyq_plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
nyq_plot.circle(x="reZ", y="imZ",source=source,size=7, color='navy', name='Data')
f = FileInput()
f.on_change('value', load_file)
layout = column(f, nyq_plot)
curdoc().add_root(layout)

wrap pandas daraframe within class

I'd like to give pandas dataframe a custom and extend it with custom methods, but still being able to use the common pandas syntax on it.
I implemented this:
import pandas as pd
class CustomDF():
def __init__(self, df: pd.DataFrame):
self.df = df
def __getattr__(self, name):
return getattr(self.df, name)
def foo(self):
print('foo')
return
the problem with the above code is that I'd like to make the following lines work:
a = CustomDF()
a.iloc[0,1]
a.foo()
but if I try to access a column of the dataframe by doing
print(a['column_name'])
I get the error "TypeError: 'CustomDF' object is not subscriptable"
Any idea on how not requiring to access .df first to get the subscription working for the super class?
Thanks!
Try inheriting from the pandas.DataFrame class. Like so:
from pandas import DataFrame
class CustomDF(DataFrame):
def foo(self):
print('foo')
return
a = CustomDF([0,1])
a.foo()

pickable figures in matplotlib and Log10Transform

You may already know, that in matplotlib 1.2.0 there is a new experimental feature, that figures are pickable (they can be saved with pickle module).
However, it doesn't work when one uses logscale, eg.
import matplotlib.pyplot as plt
import numpy as np
import pickle
ax = plt.subplot(111)
x = np.linspace(0, 10)
y = np.exp(x)
plt.plot(x, y)
ax.set_yscale('log')
pickle.dump(ax, file('myplot.pickle', 'w'))
results in:
PicklingError: Can't pickle <class 'matplotlib.scale.Log10Transform'>: attribute lookup matplotlib.scale.Log10Transform failed
Anybody knows any solution/workaround to this?
I've opened this as a bug report on matplotlib's github issue tracker. Its a fairly easy fix to implement on the matplotlib repository side (simply don't nest the Log10Transform class inside the LogScale class), but that doesn't really help you in being able to use this with mpl 1.2.0...
There is a solution to getting this to work for you in 1.2.0, but I warn you - its not pretty!
Based on my answer to a pickling question it is possible to pickle nested classes (as Log10Transform is). All we need to do is to tell Log10Transform how to "reduce" itself:
import matplotlib.scale
class _NestedClassGetter(object):
"""
When called with the containing class as the first argument,
the name of the nested class as the second argument,
and the state of the object as the third argument,
returns an instance of the nested class.
"""
def __call__(self, containing_class, class_name, state):
nested_class = getattr(containing_class, class_name)
# return an instance of a nested_class. Some more intelligence could be
# applied for class construction if necessary.
c = nested_class.__new__(nested_class)
c.__setstate__(state)
return c
def _reduce(self):
# return a class which can return this class when called with the
# appropriate tuple of arguments
cls_name = matplotlib.scale.LogScale.Log10Transform.__name__
call_args = (matplotlib.scale.LogScale, cls_name, self.__getstate__())
return (_NestedClassGetter(), call_args)
matplotlib.scale.LogScale.Log10Transform.__reduce__ = _reduce
You might also decide to do this for other Log based transforms/classes, but for your example, you can now pickle (and successfully unpickle) your example figure:
import matplotlib.pyplot as plt
import numpy as np
import pickle
ax = plt.subplot(111)
x = np.linspace(0, 10)
y = np.exp(x)
plt.plot(x, y)
ax.set_yscale('log')
pickle.dump(ax, file('myplot.pickle', 'w'))
plt.savefig('pickle_log.pre.png')
plt.close()
pickle.load(file('myplot.pickle', 'r'))
plt.savefig('pickle_log.post.png')
I'm going to get on and fix this for mpl 1.3.x so that this nasty workaround isn't needed in the future :-) .
HTH,