How to call functions and create a new function - pandas

I have created 2 functions that returns two dataframe.I want to create another function and merge dataframe from function1, function2 and manipulate the data there. How can i call the function and merge it together.The way i called doesn't work for me
def func1():
return df1
def func2():
return df2
def fucn3():
func1()
func2()

Your question is not entirely clear but what I think you mean is:
Use merge:
def func3():
df = func1().merge(func2())
#do something with df
return df

Related

Series.agg() works differently when passing function

Called a Function inside agg() of Series, from below snippet of code, in first call it's printing int number for variable "a", and in second call it's coming as Series. I am not able to figure it out the reason for this behaviour.
import pandas as pd
ser = pd.Series([1,2,3,4,5])
def find_second_last(a):
print(a)
return a.iloc[-2]
ser.agg(find_second_last)
.iloc with single position without [] will return the int by default
a.iloc[[-2]]# return pd.Series
a.iloc[-2] # return int
a.iloc[1:] # return pd.Series

How to check if the input is a regular dataframe or groupby object?

I am looking for a function that can check if the input object is a dataframe or a goubby object.
def fun(input_object):
if is_goupby_object():
// Doing something on goupby object
else:
// Otherwise do something on the dataframe
You can try this
def is_goupby_object(obj):
try:
if(obj.ngroups > 0):
return True
except:
return False
if(is_goupby_object(df_groupby)):
// Doing something on gouupby object
elif isinstance(dfObj, pd.DataFrame):
// Otherwise do something on the dataframe
else:
// not a groupby or df object

How to call custom function in Pandas

I have defined a custom function to correct outliers of one of my DF column. The function is working as expected, but i am not getting idea how to call this function in DF. Could you please help me in solving this?
Below is my custom function:
def corr_sft_outlier(in_bhk, in_sft):
bhk_band = np.quantile(outlierdf2[outlierdf2.bhk_size==in_bhk]['avg_sft'], (.20,.90))
lower_band = round(bhk_band[0])
upper_band = round(bhk_band[1])
if (in_sft>=lower_band)&(in_sft<=upper_band):
return in_sft
elif (in_sft<lower_band):
return lower_band
elif (in_sft>upper_band):
return upper_band
else:
return None
And i am calling this function in below ways, but both are not working.
outlierdf2[['bhk_size','avg_sft']].apply(corr_sft_outlier)
outlierdf2.apply(corr_sft_outlier(outlierdf2['bhk_size'],outlierdf2['avg_sft']))
Here you go:
outlierdf2['adj_avg_sft'] = df.apply(lambda x: corr_sft_outlier(x['bhk_size'],x['avg_sft']), axis=1)

Should I use classes for pandas.DataFrame?

I have more of a general question. I've written a couple of functions that transform data successively:
def func1(df):
pass
...
def main():
df = pd.read_csv()
df1 = func1(df)
df2 = func2(df1)
df3 = func3(df2)
df4 = func4(df3)
df4.to_csv()
if __name__ == "__main__":
main()
Is there a better way of organizing the logic of my script?
Should I use classes for cases like this when everything is tied to one dataset?
It depends of your usecase. For what I understand, I would use dictionary of your functions that process a df.
For instance:
function_returning_a_df = { "f1": func1, "f2": func2, "f3": func3}
df = pd.read_csv(csv)
if this df needs 3 functions to be applied
df_processing = ["f1","f2","f3"] #function will be applied in this order
# If you need to keep df at every step you can make a list
dfs_processed = []
for func in df_processing:
dfs_processed.append(df) # if you want to save all steps
df = function_returning_a_df[func](df)

How to merge 4 dataframes into one

I have created a functions that returns a dataframe.Now, i want merge all dataframe into one. First, i called all the function and used reduce and merge function.It did not work as expected.The error i am getting is "cannot combine function.It should be dataframe or series.I checked the type of my df,it is dataframe not functions. I don't know where the error is coming from.
def func1():
return df1
def func2():
return df2
def func3():
return df3
def func4():
return df4
def alldfs():
df_1 = func1()
df_2 = func2()
df_3 = func3()
df_4 = func4()
result = reduce(lambda df_1,d_2,df_3,df_4: pd.merge(df_1,df_2,df_3,df_4,on ="EMP_ID"),[df1,df2,df3,df4)
print(result)
You could try something like this ( assuming that EMP_ID is common across all dataframes and you want the intersection of all dataframes ) -
result = pd.merge(df1, df2, on='EMP_ID').merge(df3, on='EMP_ID').merge(df4, on='EMP_ID')