using pandas and try to resample
index = pd.date_range('1/1/2000', periods=9, freq='T')
series = pd.Series(range(9), index=index)
series.resample('3T').mean()
getting:
ImportError: cannot import name 'ResamplerWindowApply' from 'pandas.core.apply' (C:\Users\XXX\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py)
any hint??
Related
I was wondering how one would create a 3D scatter chart in Taipy.
I tried this code initially:
import pandas as pd
import numpy as np
from taipy import Gui
df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=list('xyz'))
df['cluster1']=np.random.randint(0,3,100)
my_page ="""
Creation of a 3-D chart:
<|{df}|chart|type=Scatter3D|x=x|y=y|z=z|mode=markers|color=cluster|>
"""
Gui(page=my_page).run()
This does indeed display a 3D plot, but the colors (clusters) do not show up.
Any hint?
Yes, you need some massaging of your dataframes to do it.
Here's a sample code that achieves this:
import pandas as pd
import numpy as np
from taipy import Gui
df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=list('xyz'))
df['cluster1']=np.random.randint(0,3,100)
# Create a list of 3 dataframes, one per cluster
datas = [df[df['cluster1']==i] for i in range(3)]
properties = {
}
# create dynamically the property list.
# str(i) points to a dataframe index
# "/x" points to the column value in the selected dataframe
for i in range(len(datas)):
properties[f"x[{i+1}]"] = str(i)+"/x"
properties[f"y[{i+1}]"] = str(i)+"/y"
properties[f"z[{i+1}]"] = str(i)+"/z"
properties[f'name[{i+1}]'] = str(i+1)
print(properties)
chart = "<|{datas}|chart|type=Scatter3D|properties={properties}|mode=markers|height=800px|>"
Gui(page=chart).run()
In fact, with the new release: Taipy 1.1, this is very easy to do in a few lines of code:
import pandas as pd
import numpy as np
from taipy import Gui
color_map={0:"blue",1:'green', 2:"red"}
df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=list('xyz'))
df['cluster1'] = np.random.randint(0,3,100)
df['cluster_colors'] = df.apply(lambda row: color_map[row.cluster1], axis=1)
marker = {"color":"cluster_colors"}
chart = "<|{df}|chart|type=Scatter3D|x=x|y=y|z=z|marker={marker}|mode=markers|height=800px|>"
Gui(page=chart).run()
If you want to leave it to Taipy to pick the colors for you, then you can simply use:
import pandas as pd
import numpy as np
from taipy import Gui
df = pd.DataFrame(np.random.randint(0,100,size=(100, 3)), columns=list('xyz'))
df['cluster1'] = np.random.randint(0,3,100)
marker = {"color":"cluster1"}
chart = "<|{df}|chart|type=Scatter3D|x=x|y=y|z=z|marker={marker}|mode=markers|height=800px|>"
Gui(page=chart).run()
It seems simplistic as a task to perform, but I've been having hard time to add a border frame to my excel-written table (using xlsxwriter engine). The only way I could do so is by getting the size of my df & starting row/column then loop on each cell and format it, which is redundant. Is there a solution I'm not seeing ? I tried the styleframe module in vain.
Reproducible example:
import pandas as pd
import numpy as np
from styleframe import StyleFrame, Styler, utils
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('AB'))
df = df.style.set_properties(**{'text-align': 'center'})
writer = StyleFrame.ExcelWriter("Test.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name= 'Random', index=False)
format_x = workbook.add_format({'border': 2})
worksheet.set_column('A:B',20,format_x)
writer.save()
pandas==1.2.4 and python==3.7
This doesn't change the formatting on column A:
import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.uniform(0, 1, 9).reshape(3, -1), columns=list('ABC'))
df.style.format({"A": '{:.1f}'})
print(df)
This works, however:
df['A'] = df['A'].map('{:.1f}'.format)
print(df)
So does this:
pd.set_option('display.float_format','{:.1f}'.format)
print(df)
Am I using the feature correctly?
Iam using rpy2 to get comorbidity Index of patients , i got the results but iam not able to convert those output to pandas Dataframe
below is the code
#creating Datframe
data = {"person_id":[1,1,1,2,2,3],
"dx_1":["F11","E40","","F32","C77","G10"],
"dx_2":["F1P","E400","","F322","C737",""]}
#converting Pandas Dataframe to R Datframe using rpy2
import rpy2
from rpy2.robjects import pandas2ri
import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
r_dataframe = pandas2ri.py2ri(df1)
print(r_dataframe)
#installing 'comorbidity ' package using rpy2
R = rpy2.robjects.r
DTW = importr('comorbidity')
#executing comorbidity function by using one column icd_1
output = DTW.comorbidity(x = r_dataframe, id = "person_id", code = "icd_1",
score = "charlson", assign0 = False,
icd = "icd10")
print(output)
but not able to convert output to pandas dataframe
import rpy2, rpy2.robjects as robjects, rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
#Converting data frames back and forth between rpy2 and pandas
from rpy2.robjects import r, pandas2ri
#convert output to pandas dataframe
pandas2ri.ri2py_dataframe(output)
getting below error
TypeError: Parameter 'categories' must be list-like, was
please help
Thanks in advance
I am not able to figure out how to graph a candlestick OHLC chart with python. Ever since matplotlib.finance was deprecated I've had this issue... Thanks for your help!
The DataFrame "quotes" is an excel (can't paste here), but has the following columns:
Index(['Date', 'Open', 'High', 'Low', 'Close'], dtype='object')
I also have a default index. The 'Date' column is a pandas._libs.tslibs.timestamps.Timestamp
When I run the code I get the following error:
File "", line 30, in
candlestick_ohlc(ax, zip(mdates.date2num(quotes.index.to_pydatetime()),
AttributeError: 'RangeIndex' object has no attribute 'to_pydatetime'
Here is my code:
import datetime
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.dates import MONDAY, DateFormatter, DayLocator,
WeekdayLocator
from mpl_finance import candlestick_ohlc
date1 = "2004-2-1"
date2 = "2004-4-12"
mondays = WeekdayLocator(MONDAY)
alldays = DayLocator()
weekFormatter = DateFormatter('%b %d')
dayFormatter = DateFormatter('%d')
fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
candlestick_ohlc(ax, zip(mdates.date2num(quotes.index.to_pydatetime()),
quotes['Open'], quotes['High'],
quotes['Low'], quotes['Close']),
width=0.6)
ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45,
horizontalalignment='right')
plt.show()
If you don't specify an index while building your DataFrame, it will default to a RangeIndex that just numbers your rows consecutively. This RangeIndex is obviously not convertible to a date -- hence the error. The read_excel function takes index_col as a parameter to specify which column to use as an index. You might also have to provide parse_dates=True.