Pandas dataframe rename column - pandas

I splited a dataframe into two parts and changed their column names seperately. Here's what I got:
df1 = df[df['colname'==0]]
df2 = df[df['colname'==1]]
df1.columns = [ 'a'+ x for x in df1.columns]
df2.columns = [ 'b'+ x for x in df2.columns]
And it turned out df2 has the columns start with 'ba' rather than 'b'. What happened?

I cannot simulate your problem, for me working nice.
Alternative solution should be add_prefix instead list comprehension:
df = pd.DataFrame({'colname':[0,1,0,0,0,1],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
C D E F colname
0 7 1 5 a 0
1 8 3 3 a 1
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
5 3 0 4 b 1
df1 = df[df['colname']==0].add_prefix('a')
df2 = df[df['colname']==1].add_prefix('b')
print (df1)
aC aD aE aF acolname
0 7 1 5 a 0
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
print (df2)
bC bD bE bF bcolname
1 8 3 3 a 1
5 3 0 4 b 1

Related

Maximum of calculated pandas column and 0

I have a very simple problem (I guess) but don't find the right syntax to do it :
The following Dataframe :
A B C
0 7 12 2
1 5 4 4
2 4 8 2
3 9 2 3
I need to create a new column D equal for each row to max (0 ; A-B+C)
I tried a np.maximum(df.A-df.B+df.C,0) but it doesn't match and give me the maximum value of the calculated column for each row (= 10 in the example).
Finally, I would like to obtain the DF below :
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
Any help appreciated
Thanks
Let us try
df['D'] = df.eval('A-B+C').clip(lower=0)
Out[256]:
0 0
1 5
2 0
3 10
dtype: int64
You can use np.where:
s = df["A"]-df["B"]+df["C"]
df["D"] = np.where(s>0, s, 0) #or s.where(s>0, 0)
print (df)
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
To do this in one line you can use apply to apply the maximum function to each row seperately.
In [19]: df['D'] = df.apply(lambda s: max(s['A'] - s['B'] + s['C'], 0), axis=1)
In [20]: df
Out[20]:
A B C D
0 0 0 0 0
1 5 4 4 5
2 0 0 0 0
3 9 2 3 10

How to concatenate a dictionary of pandas DataFrames into a signle DataFrame?

I have three DataFrames containing each a single row
dfA = pd.DataFrame( {'A':[3], 'B':[2], 'C':[1], 'D':[0]} )
dfB = pd.DataFrame( {'A':[9], 'B':[3], 'C':[5], 'D':[1]} )
dfC = pd.DataFrame( {'A':[3], 'B':[4], 'C':[7], 'D':[8]} )
for instance dfA is
A B C D
0 3 2 1 0
I organize them in a dictionary:
data = {'row_1': dfA, 'row_2': dfB, 'row_3': dfC}
I want to concatenate them into a single DataFrame
ans = pd.concat(data)
which returns
A B C D
row_1 0 3 2 1 0
row_2 0 9 3 5 1
row_3 0 3 4 7 8
whereas I want to obtain this
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8
That is to say I want to "drop" an index column.
How do I do this?
Use DataFrame.reset_index with second level and parameter drop=True:
df = ans.reset_index(level=1, drop=True)
print (df)
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8
You can reset index:
pd.concat(data).reset_index(level=-1,drop=True)
Output:
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8

Split a column by element and create new ones with pandas

Goal: I want to split one single column by elements (not the strings cells) and, from that division, create new columns, where the element is the title of the new column and the other values from another columns compose the respective column.
There is a way of doing that with pandas? Thanks in advance.
Example:
[IN]:
A 1
A 2
A 6
A 99
B 7
B 8
B 19
B 18
[OUT]:
A B
1 7
2 8
6 19
99 18
Just an alternative if 2 column input data:
print(df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1=pd.DataFrame(df.groupby('col1')['col2'].apply(list).to_dict())
print(df1)
A B
0 1 7
1 2 8
2 6 19
3 99 18
Use Series.str.split with GroupBy.cumcount for counter, then reshape by DataFrame.set_index with Series.unstack:
print (df)
col
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1 = df['col'].str.split(expand=True)
g = df1.groupby(0).cumcount()
df2 = df1.set_index([0, g])[1].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
If 2 columns input data:
print (df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
g = df.groupby('col1').cumcount()
df2 = df.set_index(['col1', g])['col2'].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18

Need to loop over pandas series to find indices of variable

I have a dataframe and a list. I would like to iterate over elements in the list and find their location in dataframe then store this to a new dataframe
my_list = ['1','2','3','4','5']
df1 = pd.DataFrame(my_list, columns=['Num'])
dataframe : df1
Num
0 1
1 2
2 3
3 4
4 5
dataframe : df2
0 1 2 3 4
0 9 12 8 6 7
1 11 1 4 10 13
2 5 14 2 0 3
I've tried something similar to this but doesn't work
for x in my_list:
i,j= np.array(np.where(df==x)).tolist()
df2['X'] = df.append(i)
df2['Y'] = df.append(j)
so looking for a result like this
dataframe : df1 updated
Num X Y
0 1 1 1
1 2 2 2
2 3 2 4
3 4 1 2
4 5 2 0
any hints or ideas would be appreciated
Instead of trying to find the value in df2, why not just make df2 a flat dataframe.
df2 = pd.melt(df2)
df2.reset_index(inplace=True)
df2.columns = ['X', 'Y', 'Num']
so now your df2 just looks like this:
Index X Y Num
0 0 0 9
1 1 0 11
2 2 0 5
3 3 1 12
4 4 1 1
5 5 1 14
You can of course sort by Num and if you just want the values from your list you can further filter df2:
df2 = df2[df2.Num.isin(my_list)]

How to prepend pandas data frames

How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8
2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Achieved by doing
A[B.columns]=B