loop through data frame

loop through data frame - pandas

I want to change the orders of data frames using for loop but it doesn't work. My code is as follows:
import pandas as pd
df1 = pd.DataFrame({'a':1, 'b':2}, index=1)
df2 = pd.DataFrame({'c':3, 'c':4}, index=1)
for df in [df1, df2]:
df = df.loc[:, df.columns.tolist()[::-1]]
Then the order of columns of df1 and df2 is not changed.

You can make use of chain assignment with list comprehension i.e
df1,df2 = [i.loc[:,i.columns[::-1]] for i in [df1,df2]]
print(df1)
b a
1 2 1
print(df2)
c
1 4

Note: In my answer I am trying to build up to show that using a dictionary to store the datafrmes is the best way for a general case. If you are looking to mutate the original dataframe variables, #Bharath answer is the way to go.
Answer:
The code doesn't work because you are not assigning back to the list of dataframes. Here's how to fix that:
import pandas as pd
df1 = pd.DataFrame({'a':1, 'b':2}, index=[1])
df2 = pd.DataFrame({'c':3, 'c':4}, index=[1])
l = [df1, df2]
for i, df in enumerate(l):
l[i] = df.loc[:, df.columns.tolist()[::-1]]
so the difference, is that I iterate with enumerate to get the dataframe and it's index in the list, then I assign the changed dataframe to the original position in the list.
execution details:
Before apply the change:
In [28]: for i in l:
...: print(i.head())
...:
a b
1 1 2
c
1 4
In [29]: for i, df in enumerate(l):
...: l[i] = df.loc[:, df.columns.tolist()[::-1]]
...:
After applying the change:
In [30]: for i in l:
...: print(i.head())
...:
b a
1 2 1
c
1 4
Improvement proposal:
It's better to use a dictionary as follows:
import pandas as pd
d= {}
d['df1'] = pd.DataFrame({'a':1, 'b':2}, index=[1])
d['df2'] = pd.DataFrame({'c':3, 'c':4}, index=[1])
for i,df in d.items():
d[i] = df.loc[:, df.columns.tolist()[::-1]]
Then you will be able to reference your dataframes from the dictionary. For instance d['df1']

You can reverse columns and values:
import pandas as pd
df1 = pd.DataFrame({'a':1, 'b': 2}, index=[1])
df2 = pd.DataFrame({'c':3, 'c': 4}, index=[1])
print('before')
print(df1)
for df in [df1, df2]:
df.values[:,:] = df.values[:, ::-1]
df.columns = df.columns[::-1]
print('after')
print(df1)
df1
Output:
before
a b
1 1 2
after
b a
1 2 1

Related

Change index of multiple dataframes at once

Here is my earlier part of the code -
import pandas as pd
import itertools as it
import numpy as np
a= pd.read_excel(r'D:\Ph.D. IEE\IEE 6570 - IR Dr. Greene\3machinediverging.xlsx')
question = pd.DataFrame(a).set_index('Job')
df2 = question.index
permutations = list(it.permutations(question.index))
dfper = pd.DataFrame(permutations)
for i in range(len(dfper)):
fr = dfper.iloc[0:len(dfper)]
fr.index.name = ''
print(fr)
for i in range(0, fr.shape[0], 1):
print (fr.iloc[i:i+1].T)
This gives me 120 dataframes.
0
0 A
1 B
2 C
3 D
4 E
1
0 A
1 B
2 C
3 E
4 D
and so on...
I would like to change the index of these dataframes to the alphabet column (using a for loop). Any help would be really appreciated. Thank you.

import pandas as pd
df1 = pd.DataFrame({0: [ 'A', 'B', 'C', 'D', 'E']})
df2 = pd.DataFrame({1: [ 'A', 'B', 'C', 'D', 'E']})
df_list = [df1, df2]
for df in df_list:
# set the first column as index
df.set_index(df.columns[0], inplace=True)

I would organize all dataframes in a list:
df_list = [df1, df2, df3,...]
and then:
for df in df_list: df = df.set_index(df.columns[0])

Get max value of 3 columns from pandas DataFrame?

I've a Pandas DataFrame with 3 columns:
c={'a': [['US']],'b': [['US']], 'c': [['US','BE']]}
df = pd.DataFrame(c, columns = ['a','b','c'])
Now I need the max value of these 3 columns.
I've tried:
df['max_val'] = df[['a','b','c']].max(axis=1)
The result is Nan instead of the expected output: US.
How can I get the max value for these 3 columns? (and what if one of them contains Nan)

Use:
c={'a': [['US', 'BE'],['US']],'b': [['US'],['US']], 'c': [['US','BE'],['US','BE']]}
df = pd.DataFrame(c, columns = ['a','b','c'])
from collections import Counter
df = df[['a','b','c']].apply(lambda x: list(Counter(map(tuple, x)).most_common()[0][0]), 1)
print (df)
0 [US, BE]
1 [US]
dtype: object

if it as # Erfan stated, most common value in a row then .agg(), mode
df.agg('mode', axis=1)
0
0 [US, BE]
1 [US]

while your data are lists, you can't use pandas.mode(). because lists objects are unhashable and mode() function won't work.
a solution is converting the elements of your dataframe's row to strings and then use pandas.mode().
check this:
>>> import pandas as pd
>>> c = {'a': [['US','BE']],'b': [['US']], 'c': [['US','BE']]}
>>> df = pd.DataFrame(c, columns = ['a','b','c'])
>>> x = df.iloc[0].apply(lambda x: str(x))
>>> x.mode()
# Answer:
0 ['US', 'BE']
dtype: object
>>> d = {'a': [['US']],'b': [['US']], 'c': [['US','BE']]}
>>> df2 = pd.DataFrame(d, columns = ['a','b','c'])
>>> z = df.iloc[0].apply(lambda z: str(z))
>>> z.mode()
# Answer:
0 ['US']
dtype: object

As I can see you have some elements as a list type, So I think the below-mentioned code will work fine.
First, append all value into an array
Then, find the most occurring element from that array.
from scipy.stats import mode
arr = []
for i in df:
for j in range(len(df[i])):
for k in range(len(df[i][j])):
arr.append(df[i][j][k])
from collections import Counter
b = Counter(arr)
print(b.most_common())
this will give you an answer as you want.

How to find pearson correlation between rows in two dataframes

I have a dataframe that I split into two dataframes of the same amount of columns and rows (df1 and df2). I want to write a function that will go through each row and feed their values into the scipy.stats.pearsonr() function. How would I do this?

Something like:
for index, row in d1.iterrows():
print(scipy.stats.pearsonr(df1.loc[index], df2.loc[index]))

If you just want the function, try this:
import pandas as pd
from scipy.stats import pearsonr
df1 = pd.DataFrame(
{
'A': [0,2,3,4,5],
'B': [2,3,4,5,6],
'C': [5,6,7,8,9],
}
)
df2 = pd.DataFrame(
{
'A': [2,1,3,4,5],
'B': [3,2,4,5,6],
'C': [7,7,7,3,3],
}
)
def pandas_pearsonr(df1, df2):
assert len(df1)==len(df2)
coefs = []
for i in range(0, len(df1)):
coefs.append(pearsonr(df1.iloc[i].values, df2.iloc[i].values))
print(coefs)
return pd.DataFrame(index=df1.index, data=coefs, columns=['coef', 'p-value'])
pandas_pearsonr(df1, df2)
Output looks like this:
coef p-value
0 0.976221 0.139109
1 0.996271 0.054996
2 1.000000 0.000000
3 -0.720577 0.487754
4 -0.838628 0.366717
But I think, it can be more pythonic. And maybe you can use pandas.DataFrame.corr

Pandas Symmetric Difference a list value between 2 columns

I've a following dataframe, df:
A B
0 [ACL1, ACL2, ACL3] [ACL1, ACL4, ACL2]
I want to perform a symmetric_difference on the A and B list so that the output will be [ACL3,ACL4]
df1 = df['A'].symmetric_difference(df['B'])
print (df1)
AttributeError: 'Series' object has no attribute 'symmetric_difference'
But it give an above error....Did I did wrongly? How can I accomplish the final output?
Thanks..

The problem is that symmetric_difference is a method of sets, instead you could do:
import pandas as pd
data = [[['ACL1', 'ACL2', 'ACL3'], ['ACL1', 'ACL4', 'ACL2']]]
df = pd.DataFrame(data=data, columns=['A', 'B'])
def symmetric_difference(x):
return list(set(x.A).symmetric_difference(x.B))
result = df[['A', 'B']].apply(symmetric_difference, axis=1)
print(result)
Output
0 [ACL3, ACL4]
dtype: object

If do care about the performance
[list(set(x).symmetric_difference(set(y))) for x , y in zip (df.A,df.B)]
[['ACL3', 'ACL4']]

Why is my dataframe not appending? [duplicate]

I know that there are several ways to build up a dataframe in Pandas. My question is simply to understand why the method below doesn't work.
First, a working example. I can create an empty dataframe and then append a new one similar to the documenta
In [3]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: df2.append(df1)
Out[3]: a b
0 1 2
However, if I do the following df2 becomes None:
In [10]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: for i in range(10):
...: df2.append(df1)
In [11]: df2
Out[11]:
Empty DataFrame
Columns: []
Index: []
Can someone explain why it works this way? Thanks!

This happens because the .append() method returns a new df:
Pandas Docs (0.19.2):
pandas.DataFrame.append
Returns: appended: DataFrame
Here's a working example so you can see what's happening in each iteration of the loop:
df1 = pd.DataFrame([[1,2],], columns=['a','b'])
df2 = pd.DataFrame()
for i in range(0,2):
print(df2.append(df1))
> a b
> 0 1 2
> a b
> 0 1 2
If you assign the output of .append() to a df (even the same one) you'll get what you probably expected:
for i in range(0,2):
df2 = df2.append(df1)
print(df2)
> a b
> 0 1 2
> 0 1 2

I think what you are looking for is:
df1 = pd.DataFrame()
df2 = pd.DataFrame([[1,2,3],], columns=['a','b','c'])
for i in range(0,4):
df1 = df1.append(df2)
df1

df.append() returns a new object. df2 is a empty dataframe initially, and it will not change. if u do a df3=df2.append(df1), u will get what u want

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

loop through data frame - pandas

You can make use of chain assignment with list comprehension i.e df1,df2 = [i.loc[:,i.columns[::-1]] for i in [df1,df2]] print(df1) b a 1 2 1 print(df2) c 1 4

Related

Change index of multiple dataframes at once

Get max value of 3 columns from pandas DataFrame?

How to find pearson correlation between rows in two dataframes

Pandas Symmetric Difference a list value between 2 columns

Why is my dataframe not appending? [duplicate]

Categories

Resources