Conditional on pandas DataFrame's - pandas

Let df1, df2, and df3 are pandas.DataFrame's having the same structure but different numerical values. I want to perform:
res=if df1>1.0: (df2-df3)/(df1-1) else df3
res should have the same structure as df1, df2, and df3 have.
numpy.where() generates result as a flat array.
Edit 1:
res should have the same indices as df1, df2, and df3 have.
For example, I can access df2 as df2["instanceA"]["parameter1"]["paramter2"]. I want to access the new calculated DataFrame/Series res as res["instanceA"]["parameter1"]["paramter2"].

Actually numpy.where should work fine there. Output here is 4x2 (same as df1, df2, df3).
df1 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df2 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
df3 = pd.DataFrame( np.random.randn(4,2), columns=list('xy') )
res = df3.copy()
res[:] = np.where( df1 > 1, (df2-df3)/(df1-1), df3 )
x y
0 -0.671787 -0.445276
1 -0.609351 -0.881987
2 0.324390 1.222632
3 -0.138606 0.955993
Note that this should work on both series and dataframes. The [:] is slicing syntax that preserves the index and columns. Without that res will come out as an array rather than series or dataframe.
Alternatively, for a series you could write as #Kadir does in his answer:
res = pd.Series(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index)
Or similarly for a dataframe you could write:
res = pd.DataFrame(np.where( df1>1, (df2-df3)/(df1-1), df3 ), index=df1.index,
columns=df1.columns)

Integrating the idea in this question into JohnE's answer, I have come up with this solution:
res = pd.Series(np.where( df1 > 1, (df2-df3)/(df1-1), df3 ), index=df1.index)
A better answer using DataFrames will be appreciated.

Say df is your initial dataframe and res is the new column. Use a combination of setting values and boolean indexing.
Set res to be a copy of df3:
df['res'] = df['df3']
Then adjust values for your condition.
df[df['df1']>1.0]['res'] = (df['df2'] - df['df3'])/(df['df1']-1)

Related

Split html table to smaller Pandas DataFrames

I'm trying to parse html tables from page ukwtv.de to Pandas DataFrames
The challange is that in one table there are combined 2 or even 3 tables together
From table
TV program name and SID as df1,
Kanal, Standort, etc. as df2,
Technische Details as df3,
Here what I managed to achieve so far:
table_MN = pd.read_html('https://www.ukwtv.de/cms/deutschland-tv/schleswig-holstein-tv.html', thousands='.', decimal=',')
df1 = table_MN[1]
df1.columns = df1.columns.str.replace(" ", "_")
df1.columns = df1.columns.str.replace("\n", "_")
df1=df1.iloc[:7 , :]
for col in df1.columns:
print(col)
if '.' in col:
df1.drop(col, axis=1, inplace=True)
df1.dropna(subset = ["TV-_und_Radio-Programme_des_Bouquets"],axis=0, inplace=True)
df1.head(15)
df2 = table_MN[1]
df2.columns = df2.iloc[7]
df2 = df2.iloc[8: , :]
df2 = df2.reset_index(drop=True)
df2.head(20)
To issue which I have problem to solve
row 7 is hardcoded how to recodnize blank line to split data i two dataframes?
Technische Details column in df1 need to be convered to separete dataframe where Modulation, Guardintervall, ... are Series names

My pandas merge is not bringing over data from the right df. Why?

The code runs without error, but the right data is not populating into the resulting dataframe.
I've tried with and without the index and neither seem to work. I looked into dtypes but it looks like they match on the fields I'm using as the index. I noted that the indicator is saying left_only, making me think the merge is not actually bringing anything over. It clearly must not be, because fields that are not null in the right df are showing null in the resulting dataframe.
df = df[(df['A'].notna())]
group = df.groupby(['A', 'B', 'Period', 'D'])
df2 = group['Monthly_Need'].sum()
df2 = df2.reset_index()
df = df.set_index(['A', 'B', 'Period', 'D'])
df2 = df2.set_index(['A', 'B', 'Period', 'D'])
df = df.merge(df2, how='left', left_index=True, right_index=True, indicator=True)
df = df.reset_index()

Calculate a new value using another dataframe

I am looking for a way to divide all columns in a dataframe with the value of a column from another df. This can be done using any of the 2 options mentioned below.
df_amenity_normalized = df_amenity.apply(
lambda row: row / df_targets['Population'].loc[row.name], axis=1)
Or join the tables and then calculate:
ndf=df_amenity.merge(df_targets, left_index=True, right_index=True)
ndft=ndf.apply(lambda x: x/ndf.Population, axis='rows' )
df_amenity_normalized1 = ndft.drop(columns=['Population', 'GNI', 'GDP', 'BM Dollar', 'HDI'])
Is there any other way to achive the same results?
Data is available here...
df_targets = pd.read_csv('https://raw.githubusercontent.com/njanakiev/osm-predict-economic-measurements/master/data/economic_measurements.csv', index_col='country')
df_targets.drop(columns='country_code', inplace=True)
df_targets = df_targets[['Population', 'GNI', 'GDP', 'BM Dollar', 'HDI']]
df_amenity = pd.read_csv('https://raw.githubusercontent.com/njanakiev/osm-predict-economic-measurements/master/data/country_amenity_counts.csv')
df_amenity.set_index('country', inplace=True)
df_amenity.drop(columns='country_code', inplace=True)
You can use the df.div() function from pandas. See below:
df_amenity.div(df_targets['Population'], axis = 0)

Combine a list of pandas dataframes that do not have the same columns to one pandas dataframe

I have three Dataframes : df1, df2, df3 with the same number of "rows" but different number of "columns" and different "column" labels. I want to "merge" them in one single dataframe with the order df1,df2,df3 and keeping the original column labels.
I've read in Combine a list of pandas dataframes to one pandas dataframe that this can be done by:
df = pd.DataFrame.from_dict(map(dict,df_list))
But I cannot fully understand the code. I assume df_list is:
df_list = [df1,df2,df3]
But what is dict? A dictionary of df_list? How to get it?
I solve this by:
df = pd.concat([df1, df2], axis=1, sort=False)
df = pd.concat([df, df3], axis=1, sort=False)

Assign dataframes in a list to a list of names; pandas

I have a variable
var=[name1,name2]
I have a dataframe also in a list
df= [df1, df2]
How do i assign df1 to name1 and df2 to name2 and so on.
If I understand correctly, assuming the lengths of both lists are the same you just iterate over the indices of both lists and just assign them, example:
In [412]:
name1,name2 = None,None
var=[name1,name2]
df1, df2 = 1,2
df= [df1, df2]
​
for x in range(len(var)):
var[x] = df[x]
var
Out[412]:
[1, 2]
If your variable list is storing strings then I would not make variables from those strings (see How do I create a variable number of variables?) and instead create a dict:
In [414]:
var=['name1','name2']
df1, df2 = 1,2
df= [df1, df2]
d = dict(zip(var,df))
d
Out[414]:
{'name1': 1, 'name2': 2}
To answer your question, you can do this by:
for i in zip(var, df):
globals()[i[0]] = i[1]
And then access your variables.
But proceeding this way is bad. You're like launching a dog in your global environment. It's better to keep control about what you handle, keep your dataframe in a list or dictionary.