implement shortcut for pandas code by extending it - pandas

Pandas has a very nice feature to save a df to clipboard. This help a lot to take the df output and analyze/inspect further in excel.
df={'A':['x','y','x','z','y'],
'B':[1,2,2,2,2],
'C':['a','b','a','d','d']}
df=pd.DataFrame(df)
df.to_clipboard(excel=True,index=False)
However I don't want to type df.to_clipboard(excel=True,index=False) each time I need to copy the df to clipboard. Is there a way I can do something like df.clip()
I tried to implement it like this but it does not work.
class ExtDF(pd.DataFrame):
def __init__(self, *args, **kwargs):
super(ExtDF, self).__init__(*args, **kwargs)
#property
def _constructor(self):
return ExtDF
def clip(self):
return self.to_clipboard(excel=True,index=False)

I would monkey patch pandas.DataFrame rather than defining a subclass:
# define a function and monkey patch pandas.DataFrame
def clipxl(self):
return self.to_clipboard(excel=True, index=False)
pd.DataFrame.clip = clipxl
# now let's try it
df={'A': ['x','y','x','z','y'],
'B': [1,2,2,2,2],
'C': ['a','b','a','d','d']}
df=pd.DataFrame(df)
df.clip()
clipboard content:
A B C
x 1 a
y 2 b
x 2 a
z 2 d
y 2 d
using a module
expand_pandas.py
import pandas as pd
# define a function and monkey patch pandas.DataFrame
def clipxl(self):
return self.to_clipboard(excel=True,index=False)
def hello(self):
return 'Hello World!'
pd.DataFrame.clip = clipxl
pd.DataFrame.hello = hello
In your environment:
import pandas as pd
import expand_pandas
df = pd.DataFrame({'A': ['x','y','x','z','y'],
'B': [1,2,2,2,2],
'C': ['a','b','a','d','d']})
df.hello()
# 'Hello World!'
df.clip()
# sends to clipboard

Instead of the object oriented approach you could write a function that does it and pass the DataFrame as a parameter.
Alternatively, instead of modifying the DataFrame class itself, you can create a class that contains your dataframe and define a method that does what you want.

Related

How do I do a kivy app with a button creating a tuple?

I've done some simple coding creating a tuple from generation of def get_data function, purely randomized. My goal is to have a screen with the three buttons that I can push, each of them generating randomized values in a df to be presented in the same screen...randomized data is generated like this:
import pandas as pd
import numpy as np
def get_data(size=1000):
df = pd.DataFrame()
df['col1'] = np.random.randint(0, 50, size)
df['col2'] = np.random.randint(0, 50, size)
df['col3'] = np.random.rand(1000)
print("df")
return df
print(get_data(size=1000))
one = get_data()
one = one[(one.col3) < 1 & (one.col3 > 0.9)]
test8 = tuple(one.itertuples(index=False, name=None))
result = test8
print("\nresult")
print(result)
How do I create a kivy app to generate random numbers for each of the df's in my get_data function? Generating a screen pushing a button for each of the col's above, this would generate tuples in a display of some sort, e.g. like the ones on calculators...
Tried this, but no screen appears:
#:kivy 1.9.1
kv = '''
<Launch>:
BoxLayout:
Button:
size:(80,80)
size_hint:(None,None)
text:"Click me"
on_press: root.generate_random_data()
'''
from kivy.app import App
from kivy.uix.button import Button
from kivy.uix.label import Label
class Test(App):
def press(self, size):
df = pd.DataFrame()
df['col1'] = np.random.randint(0, 50, size)
df['col2'] = np.random.randint(0, 50, size)
df['col3'] = np.random.rand(1000)
# print("Pressed")
return df
def build(self):
butt=Button(text="Click")
butt.bind(on_press=self.press) #dont use brackets while calling function
return butt
Test().run()
Got this error message:
File "C:\Users\...\.\...\tuple_test_new_1.py", line 60, in <module>
Test().run()
TypeError: expected sequence object with len >= 0 or a single integer
Any support from kivy sharks out there would much appreciated...;o)
Solution worked out looks something like this:
class MyDf(App):
def __init__(self):
super().__init__()
a = pd.DataFrame(columns=['col1', 'col2', 'col3'])
a['col1'] = pd.DataFrame(np.random.randint(0, 50, size=1000))
a['col2'] = pd.DataFrame(np.random.randint(0, 50, size=1000))
a['col3'] = pd.DataFrame(np.random.rand(1000))
self.df = tuple(a.itertuples(index=False, name=None))
def press(self,instance):
print(self.df)
def build(self):
butt=Button(text="Click")
butt.bind(on_press=self.press) #dont use brackets while calling function
return butt
MyDf().run()

How do you add dataclasses as valid index values to a plotly chart?

I am trying to switch from the matplotlib pandas plotting backend to plotly. However, I am being held back by a common occurrence of this error:
TypeError: Object of type Quarter is not JSON serializable
Where Quarter is a dataclass in my codebase.
For a minimal example, consider:
#dataclass
class Foo:
val:int
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df.plot.scatter(x='x', y='y')
As expected, the above returns:
TypeError: Object of type Foo is not JSON serializable
Now, I don't expect plotly to be magical, but adding a __float__ magic method allows the Foo objects to be used with the matplotlib backend:
# This works
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df.plot.scatter(x='x', y='y')
How can I update my dataclass to allow for it to be used with the plotly backend?
You can get pandas to cast to float before invoking plotting backend.
from dataclasses import dataclass
import pandas as pd
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df["x"].astype(float)
pd.options.plotting.backend = "plotly"
df.assign(x=lambda d: d["x"].astype(float)).plot.scatter(x='x', y='y')
monkey patching
if you don't want to change code, you can monkey patch the plotly implementation of pandas plotting API
https://pandas.pydata.org/pandas-docs/stable/development/extending.html#plotting-backends
from dataclasses import dataclass
import pandas as pd
import wrapt, json
import plotly
#wrapt.patch_function_wrapper(plotly, 'plot')
def new_plot(wrapped, instance, args, kwargs):
try:
json.dumps(args[0][kwargs["x"]])
except TypeError:
args[0][kwargs["x"]] = args[0][kwargs["x"]].astype(float)
return wrapped(*args, **kwargs)
#dataclass
class Foo:
val:int
def __float__(self):
return float(self.val)
df = pd.DataFrame({'x': [Foo(i) for i in range(10)], 'y':list(range(10))})
df["x"].astype(float)
pd.options.plotting.backend = "plotly"
df.plot.scatter(x='x', y='y')

Python: How to use a dataframe outside the class which is built inside class?

class Builder():
def __init__(self, args, date, crv)
def listInstruments(self, crv):
df_list = pd.DataFrame(columns=['curve','instrument','quote'])
for instrument, data in self.instruments.items():
if 'BASIS' in instrument:
for........
else:
for quote in data['QUOTES']:
new_row = {'curve':crv,'instrument':instrument,'quote':quote}
df = df_list.append(new_row,ignore_index=True)
return df
Above is the code for reference. I would like to use df for further analysis outside this class. How can I print df outside this class ? Please suggest.

wrap pandas daraframe within class

I'd like to give pandas dataframe a custom and extend it with custom methods, but still being able to use the common pandas syntax on it.
I implemented this:
import pandas as pd
class CustomDF():
def __init__(self, df: pd.DataFrame):
self.df = df
def __getattr__(self, name):
return getattr(self.df, name)
def foo(self):
print('foo')
return
the problem with the above code is that I'd like to make the following lines work:
a = CustomDF()
a.iloc[0,1]
a.foo()
but if I try to access a column of the dataframe by doing
print(a['column_name'])
I get the error "TypeError: 'CustomDF' object is not subscriptable"
Any idea on how not requiring to access .df first to get the subscription working for the super class?
Thanks!
Try inheriting from the pandas.DataFrame class. Like so:
from pandas import DataFrame
class CustomDF(DataFrame):
def foo(self):
print('foo')
return
a = CustomDF([0,1])
a.foo()

What pandas DataFrame method tells ipython notebook to display as HTML

I have created a class in which the main deliverable piece of data is stored in an attribute as a pandas DataFrame. I'd like the default display behavior of instances of this class to be the same as that of this DataFrame. Particularly when in iPython Notebook.
for example:
from pandas import DataFrame
class TestDFDisplay():
def __init__(self):
self.dataframe = DataFrame([[1, 2], [3, 4]])
tdf = TestDFDisplay()
when I:
tdf.dataframe
I get an HTML version of:
0 1
0 1 2
1 3 4
when I:
tdf
I get:
<__main__.TestDFDisplay instance at 0x000000001A836788>
I'd rather get the same HTML of:
0 1
0 1 2
1 3 4
Instead, I could:
from pandas import DataFrame
class TestDFDisplay():
def __init__(self):
self.dataframe = DataFrame([[1, 2], [3, 4]])
def __getattr__(self, item):
try:
return object.__getattribute__(self, item)
except AttributeError:
try:
return getattr(self.dataframe, item)
except:
raise AttributeError
tdf = TestDFDisplay()
but this is a very heavy handed way of diverting any and every attempt to get an attribute from the class instance to attempting to get it from the DataFrame. This works, but I'd rather be more precise and do some like the following:
from pandas import DataFrame
class TestDFDisplay():
def __init__(self):
self.dataframe = DataFrame([[1, 2], [3, 4]])
def __repr__(self):
return self.dataframe.__repr__()
tdf = TestDFDisplay()
So when I:
tdf
I get the text version of (which is the same as displayed here) and not the HTML version which I want:
0 1
0 1 2
1 3 4
That's ok. That just means that the 'repr' method wasn't the correct method that's being called on the DataFrame in order to display the HTML in iPython Notebook.
My question is: What is the correct method I should be redirecting at the DataFrame?
You want the _repr_html_ It's what IPython/Jupyter looks for when using the rich (HTML) display system.
So in your class
class TestDFDisplay():
def __init__(self):
self.dataframe = DataFrame([[1, 2], [3, 4]])
def _repr_html_(self):
return self.dataframe._repr_html_()
should work.