How to concatenate a dictionary of pandas DataFrames into a signle DataFrame?

How to concatenate a dictionary of pandas DataFrames into a signle DataFrame? - pandas

I have three DataFrames containing each a single row
dfA = pd.DataFrame( {'A':[3], 'B':[2], 'C':[1], 'D':[0]} )
dfB = pd.DataFrame( {'A':[9], 'B':[3], 'C':[5], 'D':[1]} )
dfC = pd.DataFrame( {'A':[3], 'B':[4], 'C':[7], 'D':[8]} )
for instance dfA is
A B C D
0 3 2 1 0
I organize them in a dictionary:
data = {'row_1': dfA, 'row_2': dfB, 'row_3': dfC}
I want to concatenate them into a single DataFrame
ans = pd.concat(data)
which returns
A B C D
row_1 0 3 2 1 0
row_2 0 9 3 5 1
row_3 0 3 4 7 8
whereas I want to obtain this
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8
That is to say I want to "drop" an index column.
How do I do this?

Use DataFrame.reset_index with second level and parameter drop=True:
df = ans.reset_index(level=1, drop=True)
print (df)
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8

You can reset index:
pd.concat(data).reset_index(level=-1,drop=True)
Output:
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8

Related

Keep the DataFrame index name after appending, a list of Series?

I want to keep the name of the index of this DataFrame after appending a list of Series, as it is kept after appending them one at a time, but:
df = pd.DataFrame([[1,2],[3,4]],index = pd.Index(['a','b'],name='keepthisname'))
0 1
keepthisname
a 1 2
b 3 4
serc = pd.Series([5,6],name='c')
0 5
1 6
Name: c, dtype: int64
dfc = df.append(serc) # one at a time works
0 1
keepthisname
a 1 2
b 3 4
c 5 6
serd = pd.Series([7,8],name='d') # as further evidenced with this...
dfc.append(serd)
0 1
keepthisname
a 1 2
b 3 4
c 5 6
d 7 8
df.append([serc,serd]) # but this wipes out the name of the index
0 1
a 1 2
b 3 4
c 5 6
d 7 8

Pandas dataframe rename column

I splited a dataframe into two parts and changed their column names seperately. Here's what I got:
df1 = df[df['colname'==0]]
df2 = df[df['colname'==1]]
df1.columns = [ 'a'+ x for x in df1.columns]
df2.columns = [ 'b'+ x for x in df2.columns]
And it turned out df2 has the columns start with 'ba' rather than 'b'. What happened?

I cannot simulate your problem, for me working nice.
Alternative solution should be add_prefix instead list comprehension:
df = pd.DataFrame({'colname':[0,1,0,0,0,1],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
C D E F colname
0 7 1 5 a 0
1 8 3 3 a 1
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
5 3 0 4 b 1
df1 = df[df['colname']==0].add_prefix('a')
df2 = df[df['colname']==1].add_prefix('b')
print (df1)
aC aD aE aF acolname
0 7 1 5 a 0
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
print (df2)
bC bD bE bF bcolname
1 8 3 3 a 1
5 3 0 4 b 1

Renaming column of one dataframe by extracting from combination of series and dataframe column names

In the line below, I am renaming the columns of pnlsummary dataframe from the column names of three series (totalheldmw, totalcost and totalsellprofit) and one dataframe (totalheldprofit).
The difficulty I have is to iterate over the column names of the dataframe. I have manually assigned the names as you can see below. I would suppose there is an efficient way of iterating over the column names of the dataframe. Please advice.
pnlsummary.columns =
[totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0],
totalheldprofit.columns[0],totalheldprofit.columns[1],
totalheldprofit.columns[2],totalheldprofit.columns[3]]

I think you need create list by constants and then add columns names converted to list:
pnlsummary.columns = [totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0]] +
totalheldprofit.columns[0:3].astype(str).tolist()
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df.columns = ['a','s','d'] + df.columns[0:3].tolist()
print (df)
a s d A B C
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b

Need to loop over pandas series to find indices of variable

I have a dataframe and a list. I would like to iterate over elements in the list and find their location in dataframe then store this to a new dataframe
my_list = ['1','2','3','4','5']
df1 = pd.DataFrame(my_list, columns=['Num'])
dataframe : df1
Num
0 1
1 2
2 3
3 4
4 5
dataframe : df2
0 1 2 3 4
0 9 12 8 6 7
1 11 1 4 10 13
2 5 14 2 0 3
I've tried something similar to this but doesn't work
for x in my_list:
i,j= np.array(np.where(df==x)).tolist()
df2['X'] = df.append(i)
df2['Y'] = df.append(j)
so looking for a result like this
dataframe : df1 updated
Num X Y
0 1 1 1
1 2 2 2
2 3 2 4
3 4 1 2
4 5 2 0
any hints or ideas would be appreciated

Instead of trying to find the value in df2, why not just make df2 a flat dataframe.
df2 = pd.melt(df2)
df2.reset_index(inplace=True)
df2.columns = ['X', 'Y', 'Num']
so now your df2 just looks like this:
Index X Y Num
0 0 0 9
1 1 0 11
2 2 0 5
3 3 1 12
4 4 1 1
5 5 1 14
You can of course sort by Num and if you just want the values from your list you can further filter df2:
df2 = df2[df2.Num.isin(my_list)]

Group by with a pandas dataframe using different aggregation for different columns

I have a pandas dataframe df with columns [a, b, c, d, e, f]. I want to perform a group by on df. I can best describe what it's supposed to do in SQL:
SELECT a, b, min(c), min(d), max(e), sum(f)
FROM df
GROUP BY a, b
How do I do this group by using pandas on my dataframe df?
consider df:
a b c d e f
1 1 2 5 9 3
1 1 3 3 4 5
2 2 4 7 4 4
2 2 5 3 8 8
I expect the result to be:
a b c d e f
1 1 2 3 9 8
2 2 4 3 8 12

use agg
df = pd.DataFrame(
dict(
a=list('aaaabbbb'),
b=list('ccddccdd'),
c=np.arange(8),
d=np.arange(8),
e=np.arange(8),
f=np.arange(8),
)
)
funcs = dict(c='min', d='min', e='max', f='sum')
df.groupby(['a', 'b']).agg(funcs).reset_index()
a b c e f d
0 a c 0 1 1 0
1 a d 2 3 5 2
2 b c 4 5 9 4
3 b d 6 7 13 6
with your data
a b c e f d
0 1 1 2 9 8 3
1 2 2 4 8 12 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to concatenate a dictionary of pandas DataFrames into a signle DataFrame? - pandas

Use DataFrame.reset_index with second level and parameter drop=True: df = ans.reset_index(level=1, drop=True) print (df) A B C D row_1 3 2 1 0 row_2 9 3 5 1 row_3 3 4 7 8

You can reset index: pd.concat(data).reset_index(level=-1,drop=True) Output: A B C D row_1 3 2 1 0 row_2 9 3 5 1 row_3 3 4 7 8

Related

Keep the DataFrame index name after appending, a list of Series?

Pandas dataframe rename column

Renaming column of one dataframe by extracting from combination of series and dataframe column names

Need to loop over pandas series to find indices of variable

Group by with a pandas dataframe using different aggregation for different columns

Categories

Resources