AttributeError: 'Styler' object has no attribute 'merge' - pandas

I have a problem like that, when i styled data (conditional format) with pandas, i can't merge that datas. You can find my code and error below,
Can anyone give me an advice?
CODE:
cm = sns.diverging_palette(10, 140, s=99, l=50,
n=9, center="light", as_cmap=True)
df_style1 = df_b.style.background_gradient(cmap=cm)
df_style2 = df_c.style.background_gradient(cmap=cm)
df_last = df_style1.merge(df_style2, on= 'EKSPER_ADI', how='left')
ERROR:
AttributeError Traceback (most recent call last)
<ipython-input-148-d1b2ae3dc7a6> in <module>
4 df_style1 = df_b.style.background_gradient(cmap=cm)
5 df_style2 = df_c.style.background_gradient(cmap=cm)
----> 6 df_last = df_style1.merge(df_style1, on= 'EKSPER_ADI', how='left')
AttributeError: 'Styler' object has no attribute 'merge'

I think not possible, first use merge and then apply styles:
df = df_b.merge(df_c, on= 'EKSPER_ADI', how='left')
df_style2 = df.style.background_gradient(cmap=cm)

Related

AttributeError: 'DataFrame' object has no attribute 'dtype' appeared suddenly

I have df with features in my google colab, and suddenly appeared error:
Code:
df_features['cooling'] = df['cooling'].astype('object')
df_features['view'] = df['view'].astype('object')
cat_features = ['cooling', 'view', 'city_region']
X = df_features.drop('target', axis=1)
y = df_features['target']
num_cols = [col for col in X.columns if X[col].dtype in ['float64','int64']]
cat_cols = [col for col in X.columns if X[col].dtype not in ['float64','int64']]
Here is error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
5 y = df_features['target']
6
----> 7 num_cols = [col for col in X.columns if X[col].dtype in ['float64','int64']]
8 cat_cols = [col for col in X.columns if X[col].dtype not in ['float64','int64']]
9
1 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'dtype'
I already tried to use !pip install --upgrade pandas but it had no success
You seem to somehow get back a DataFrame and not a Series by calling X[col]. Not sure why, because you did not supply the full structure and data of your dataframe.
.dtype is for pandas Series https://pandas.pydata.org/docs/reference/api/pandas.Series.dtype.html
.dtypes is for pandas Dataframes (and seems also to work with pandas Series) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html

Pandas function with 2 arguments to find threshold

I need to find people with greater or equal to threshold gain =>
dataframe contains column 'capitalGain' with different values 10,20,30,50,1000,5000,10000 ...etc
I try :
Function:
def get_num_people_with_higher_gain(dataframe, threshold_gain)
threshold_gain = dataframe["capitalGain"][dataframe["capitalGain"] >= threshold_gain].count()
return threshold_gain
Call function
df = get_num_people_with_higher_gain(dataframe, threshold_gain)
But I get the following error message:
NameError Traceback (most recent call last)
<ipython-input-50-5485c90412c8> in <module>
----> 1 df = get_num_people_with_higher_gain(dataframe, threshold_gain)
2 threshold = get_num_people_with_higher_gain(dataframe, threshold_gain)
NameError: name 'dataframe' is not defined
Since there are 2 arguments in the function (dataframe, threshold_gain), does it mean that both should be somehow defined within the function ?
Thanks
Finally,
Here is the solution
def get_num_people_with_higher_gain(dataframe, threshold_gain):
result = len(dataframe[dataframe["capitalGain"] >= threshold_gain])
return result
result = get_num_people_with_higher_gain(dataframe,60000)
result

Error : len() of unsized object - Wilconox signed-rank test

I am running Wilconox signed-rank test on the dataset which looks like :
df = {'Year': ['2019','2018','2017', ....], 'Name':{jon, tim, luca,...}, 'SelfPromotion': [1,0,1,...]}
the script is as follows:
import pandas
from scipy.stats import mannwhitneyu
data1 = df['SelfPromotion']=1
data2 = df['SelfPromotion']=0
print(mannwhitneyu(data1, data2))
this gives me the following error:
TypeError: len() of unsized object
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-e49d9838e5ac> in <module>
3 data1 = data['SelfPromotion']=1
4 data2 = data['SelfPromotion']=0
----> 5 print(mannwhitneyu(data11, data22))
~/opt/anaconda3/envs/shityaar/lib/python3.7/site-packages/scipy/stats/stats.py in mannwhitneyu(x, y, use_continuity, alternative)
6391 x = np.asarray(x)
6392 y = np.asarray(y)
-> 6393 n1 = len(x)
6394 n2 = len(y)
6395 ranked = rankdata(np.concatenate((x, y)))
TypeError: len() of unsized object
I have tried every possible solution for this error by looking at similar questions but unfortunately, no solution could get it to work. I would appreciate some help.
mannwhitneyu expects array like parameters and you are passing integers as args, hence the failure.
Do something like this:
In [26]: data1 = df['SelfPromotion'] == 1
In [28]: data2 = df['SelfPromotion'] == 0
In [31]: mannwhitneyu(data1, data2)
Out[31]: MannwhitneyuResult(statistic=3.0, pvalue=0.30962837708843105)

How to get rid of "AttributeError: 'float' object has no attribute 'log2' "

Say I have a data frame with columns of min value =36884326.0, and max value =6619162563.0, which I need to plot as box plot, so I tried to log transform the values, as follows,
diff["values"] = diff['value'].apply(lambda x: (x+1))
diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
However, the above lines are throwing the error as follows,
AttributeError Traceback (most recent call last)
<ipython-input-28-fe4e1d2286b0> in <module>
1 diff['value'].max()
2 diff["values"] = diff['value'].apply(lambda x: (x+1))
----> 3 diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
~/software/anaconda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3192 else:
3193 values = self.astype(object).values
-> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype)
3195
3196 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-28-fe4e1d2286b0> in <lambda>(x)
1 diff['value'].max()
2 diff["values"] = diff['value'].apply(lambda x: (x+1))
----> 3 diff["log_values"] = diff['values'].apply(lambda x: x.log2(x))
AttributeError: 'float' object has no attribute 'log2'
Any suggestions would be great. Thanks
You need numpy.log2 function to aplly, please, check sintaxis here.

Error in using np.NaN is vectorize functions

I am using Python 3 on 64bit Win1o. I had issues with the following simple function:
def skudiscounT(t):
s = t.find("ITEMADJ")
if s >= 0:
t = t[s + 8:]
if t.find("-") == 2:
return t
else:
return np.nan # if change to "" it will work fine!
I tried to use this function in np.Vectorize and got the following error:
Traceback (most recent call last):
File "C:/Users/lz09/Desktop/P3/SODetails_Clean_V1.py", line 45, in <module>
SO["SKUDiscount"] = np.vectorize(skudiscounT)(SO['Description'])
File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2739, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\PD\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2818, in _vectorize_call
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: could not convert string to float: '23-126-408'
When I replace the last line [return np.nan] to [return ''] it worked fine. Anyone know why this is case? Thanks!
Without otypes the dtype of the return array is determined by the first trial result:
In [232]: f = np.vectorize(skudiscounT)
In [234]: f(['abc'])
Out[234]: array([ nan])
In [235]: _.dtype
Out[235]: dtype('float64')
I'm trying to find an argument that returns a string. It looks like your function can also return None.
From the docs:
The data type of the output of vectorized is determined by calling
the function with the first element of the input. This can be avoided
by specifying the otypes argument.
With otypes:
In [246]: f = np.vectorize(skudiscounT, otypes=[object])
In [247]: f(['abc', '23-126ITEMADJ408'])
Out[247]: array([nan, None], dtype=object)
In [248]: f = np.vectorize(skudiscounT, otypes=['U10'])
In [249]: f(['abc', '23-126ITEMADJ408'])
Out[249]:
array(['nan', 'None'],
dtype='<U4')
But for returning a generic object dtype, I'd use the slightly faster:
In [250]: g = np.frompyfunc(skudiscounT, 1,1)
In [251]: g(['abc', '23-126ITEMADJ408'])
Out[251]: array([nan, None], dtype=object)
So what kind of array do you want? float that can hold np.nan, string? or object that can hold 'anything'.