Here is an example code to demonstrate my problem:
import numpy as np
import pandas as pd
np.random.seed(10)
df = pd.DataFrame(np.random.randint(0,10,size=(100, 2)), columns=list('xy'))
df
x y
0 9 4
1 0 1
2 9 0
3 1 8
4 9 0
... ... ...
95 0 4
96 6 4
97 9 8
98 0 7
99 1 7
groups = df.groupby(['x'])
groups.size()
x
0 11
1 12
2 15
3 13
4 14
5 5
6 6
7 9
8 5
9 10
dtype: int64
How can I access the x-values as a column and the aggregated y-values as a second column to plot x versus y?
Two options.
Use reset_index():
groups = df.groupby(['x']).size().reset_index(name='size')
Add as_index=False to groupby:
groups = df.groupby(['x'], as_index=False).size()
Output for both:
>>> groups
x size
0 0 16
1 1 9
2 2 9
3 3 5
4 4 7
5 5 10
6 6 10
7 7 7
8 8 12
9 9 15
IIUC, use as_index=False:
groups = df.groupby(['x'], as_index=False)
out = groups.size()
out.plot(x='x', y='size')
If you only want to plot, you can also keep the x as index:
df.groupby(['x']).size().plot()
output:
x size
0 0 16
1 1 9
2 2 9
3 3 5
4 4 7
5 5 10
6 6 10
7 7 7
8 8 12
9 9 15
This question already has answers here:
Pandas Melt Function
(2 answers)
Closed 1 year ago.
How can I transform a Dataframe with columns S (start), E (end), V (value)
S E V
1 2 3
2 5 11
5 11 5
And transform it to:
T V
1 3
2 3
2 11
5 11
5 5
11 5
?
This is so that we can plot the data with in such a way the value V (y-axis) is the same throughout the interval.
Edit:
Some are suggesting this is the same as a "how do I use melt()?" question. However the order of the result is important.
Or with set_index/stack:
df = df.set_index('V').stack().reset_index(-1, drop =True).reset_index(name = 'T')
OUTPUT:
V T
0 3 1
1 3 2
2 11 2
3 11 5
4 5 5
5 5 11
Try with melt
df.melt('V')
Out[39]:
V variable value
0 3 S 1
1 11 S 2
2 5 S 5
3 3 E 2
4 11 E 5
5 5 E 11
Given pandas multiple columns as below
cl_a cl_b cl_c cl_d cl_e
0 1 a 5 6 20
1 2 b 4 7 21
2 3 c 3 8 22
3 4 d 2 9 23
4 5 e 1 10 24
I would like to stack the column cl_c cl_d cl_e into a single column with the name ax. But, please note that, the columns cl_a cl_b were maintained.
cl_a cl_b ax from_col
1,a,5,cl_c
2,b,4,cl_c
3,c,3,cl_c
4,d,2,cl_c
5,e,1,cl_c
1,a,6,cl_d
2,b,7,cl_d
3,c,8,cl_d
4,d,9,cl_d
5,e,10,cl_d
1,a,20,cl_e
2,b,21,cl_e
3,c,22,cl_e
4,d,23,cl_e
5,e,24,cl_e
So far, the following code does the job
df = pd.DataFrame ( {'cl_a': [1,2,3,4,5], 'cl_b': ['a','b','c','d','e'],
'cl_c': [5,4,3,2,1],'cl_d': [6,7,8,9,10],
'cl_e': [20,21,22,23,24]})
df_new = pd.DataFrame()
for col_name in ['cl_c','cl_d','cl_e']:
df_new=df_new.append (df [['cl_a', 'cl_b', col_name]].rename(columns={col_name: "ax"}))
However, I am curious whether there is Pandas build-in approach that can do the trick
Edit:
Upon Quong answer, I realise of the need to include another column (i.e., from_col) beside the ax. The from_col indicate the origin of ax previous column name.
Yes, it's called melt:
df.melt(['cl_a','cl_b'], value_name='ax').drop(columns='variable')
Output:
cl_a cl_b ax
0 1 a 5
1 2 b 4
2 3 c 3
3 4 d 2
4 5 e 1
5 1 a 6
6 2 b 7
7 3 c 8
8 4 d 9
9 5 e 10
10 1 a 20
11 2 b 21
12 3 c 22
13 4 d 23
14 5 e 24
Or equivalently set_index().stack():
(df.set_index(['cl_a','cl_b']).stack()
.reset_index(level=-1, drop=True)
.reset_index(name='ax')
)
with a slightly different output:
cl_a cl_b ax
0 1 a 5
1 1 a 6
2 1 a 20
3 2 b 4
4 2 b 7
5 2 b 21
6 3 c 3
7 3 c 8
8 3 c 22
9 4 d 2
10 4 d 9
11 4 d 23
12 5 e 1
13 5 e 10
14 5 e 24
Goal: I want to split one single column by elements (not the strings cells) and, from that division, create new columns, where the element is the title of the new column and the other values from another columns compose the respective column.
There is a way of doing that with pandas? Thanks in advance.
Example:
[IN]:
A 1
A 2
A 6
A 99
B 7
B 8
B 19
B 18
[OUT]:
A B
1 7
2 8
6 19
99 18
Just an alternative if 2 column input data:
print(df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1=pd.DataFrame(df.groupby('col1')['col2'].apply(list).to_dict())
print(df1)
A B
0 1 7
1 2 8
2 6 19
3 99 18
Use Series.str.split with GroupBy.cumcount for counter, then reshape by DataFrame.set_index with Series.unstack:
print (df)
col
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1 = df['col'].str.split(expand=True)
g = df1.groupby(0).cumcount()
df2 = df1.set_index([0, g])[1].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
If 2 columns input data:
print (df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
g = df.groupby('col1').cumcount()
df2 = df.set_index(['col1', g])['col2'].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8
2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Achieved by doing
A[B.columns]=B