Pandas pivot columns based on column name prefix - pandas

I have a dataframe:
df = AG_Speed AG_wolt AB_Speed AB_wolt C1 C2 C3
1 2 3 4 6 7 8
1 9 2 6 4 1 8
And I want to pivot it based on prefix to get:
df = Speed Wolt C1 C2 C3 Category
1 2 6 7 8 AG
3 4 6 7 8 AB
1 9 4 1 8 AG
2 6 4 1 8 AG
What is the best way to do it?

We can use pd.wide_to_long for this. But since it expects the column names to start with the stubnames, we have to reverse the column format:
df.columns = ["_".join(col.split("_")[::-1]) for col in df.columns]
res = pd.wide_to_long(
df,
stubnames=["Speed", "wolt"],
i=["C1", "C2", "C3"],
j="Category",
sep="_",
suffix="[A-Za-z]+"
).reset_index()
C1 C2 C3 Category Speed wolt
0 6 7 8 AG 1 2
1 6 7 8 AB 3 4
2 4 1 8 AG 1 9
3 4 1 8 AB 2 6
If you want the columns in a specific order, use DataFrame.reindex:
res.reindex(columns=["Speed", "wolt", "C1", "C2", "C3", "Category"])
Speed wolt C1 C2 C3 Category
0 1 2 6 7 8 AG
1 3 4 6 7 8 AB
2 1 9 4 1 8 AG
3 2 6 4 1 8 AB

One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index = ['C1', 'C2', 'C3'],
names_to = ('Category', '.value'),
names_sep='_')
C1 C2 C3 Category Speed wolt
0 6 7 8 AG 1 2
1 4 1 8 AG 1 9
2 6 7 8 AB 3 4
3 4 1 8 AB 2 6
In the above solution, the .value determines which parts of the column labels remain as headers - the labels are split apart with the names_sep.

Related

Stack multiple columns into single column while maintaining other columns in Pandas?

Given pandas multiple columns as below
cl_a cl_b cl_c cl_d cl_e
0 1 a 5 6 20
1 2 b 4 7 21
2 3 c 3 8 22
3 4 d 2 9 23
4 5 e 1 10 24
I would like to stack the column cl_c cl_d cl_e into a single column with the name ax. But, please note that, the columns cl_a cl_b were maintained.
cl_a cl_b ax from_col
1,a,5,cl_c
2,b,4,cl_c
3,c,3,cl_c
4,d,2,cl_c
5,e,1,cl_c
1,a,6,cl_d
2,b,7,cl_d
3,c,8,cl_d
4,d,9,cl_d
5,e,10,cl_d
1,a,20,cl_e
2,b,21,cl_e
3,c,22,cl_e
4,d,23,cl_e
5,e,24,cl_e
So far, the following code does the job
df = pd.DataFrame ( {'cl_a': [1,2,3,4,5], 'cl_b': ['a','b','c','d','e'],
'cl_c': [5,4,3,2,1],'cl_d': [6,7,8,9,10],
'cl_e': [20,21,22,23,24]})
df_new = pd.DataFrame()
for col_name in ['cl_c','cl_d','cl_e']:
df_new=df_new.append (df [['cl_a', 'cl_b', col_name]].rename(columns={col_name: "ax"}))
However, I am curious whether there is Pandas build-in approach that can do the trick
Edit:
Upon Quong answer, I realise of the need to include another column (i.e., from_col) beside the ax. The from_col indicate the origin of ax previous column name.
Yes, it's called melt:
df.melt(['cl_a','cl_b'], value_name='ax').drop(columns='variable')
Output:
cl_a cl_b ax
0 1 a 5
1 2 b 4
2 3 c 3
3 4 d 2
4 5 e 1
5 1 a 6
6 2 b 7
7 3 c 8
8 4 d 9
9 5 e 10
10 1 a 20
11 2 b 21
12 3 c 22
13 4 d 23
14 5 e 24
Or equivalently set_index().stack():
(df.set_index(['cl_a','cl_b']).stack()
.reset_index(level=-1, drop=True)
.reset_index(name='ax')
)
with a slightly different output:
cl_a cl_b ax
0 1 a 5
1 1 a 6
2 1 a 20
3 2 b 4
4 2 b 7
5 2 b 21
6 3 c 3
7 3 c 8
8 3 c 22
9 4 d 2
10 4 d 9
11 4 d 23
12 5 e 1
13 5 e 10
14 5 e 24

How to perform set-like operations in pandas?

I need to fill a column with values, that are present in a set and not present in any other columns.
initial df
c0 c1 c2 c3 c4 c5
0 4 5 6 3 2 1
1 1 5 4 0 2 3
2 5 6 4 0 1 3
3 5 4 6 2 0 1
4 5 6 4 0 1 3
5 0 1 4 5 6 2
I need df['c6'] column that is a set-like difference operation product between a set of set([0,1,2,3,4,5,6]) and each row of df
so that the result df is
c0 c1 c2 c3 c4 c5 c6
0 4 5 6 3 2 1 0
1 1 5 4 0 2 3 6
2 5 6 4 0 1 3 2
3 5 4 6 2 0 1 3
4 5 6 4 0 1 3 2
5 0 1 4 5 6 2 3
Thank you!
Slightly different approach:
df['c6'] = sum(range(7)) - df.sum(axis=1)
or if you want to be more verbose:
df['c6'] = sum([0,1,2,3,4,5,6]) - df.sum(axis=1)
Use numpy setdiff1d to find the difference between the two arrays and assign the output to column c6
ck = np.array([0,1,2,3,4,5,6])
M = df.to_numpy()
df['c6'] = [np.setdiff1d(ck,i)[0] for i in M]
c0 c1 c2 c3 c4 c5 c6
0 4 5 6 3 2 1 0
1 1 5 4 0 2 3 6
2 5 6 4 0 1 3 2
3 5 4 6 2 0 1 3
4 5 6 4 0 1 3 2
5 0 1 4 5 6 2 3
A simple way I could think of is using a list comprehension and set difference:
s = {0, 1, 2, 3, 4, 5, 6}
s
{0, 1, 2, 3, 4, 5, 6}
df['c6'] = [tuple(s.difference(vals))[0] for vals in df.values]
df
c0 c1 c2 c3 c4 c5 c6
0 4 5 6 3 2 1 0
1 1 5 4 0 2 3 6
2 5 6 4 0 1 3 2
3 5 4 6 2 0 1 3
4 5 6 4 0 1 3 2
5 0 1 4 5 6 2 3

Split a column by element and create new ones with pandas

Goal: I want to split one single column by elements (not the strings cells) and, from that division, create new columns, where the element is the title of the new column and the other values from another columns compose the respective column.
There is a way of doing that with pandas? Thanks in advance.
Example:
[IN]:
A 1
A 2
A 6
A 99
B 7
B 8
B 19
B 18
[OUT]:
A B
1 7
2 8
6 19
99 18
Just an alternative if 2 column input data:
print(df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1=pd.DataFrame(df.groupby('col1')['col2'].apply(list).to_dict())
print(df1)
A B
0 1 7
1 2 8
2 6 19
3 99 18
Use Series.str.split with GroupBy.cumcount for counter, then reshape by DataFrame.set_index with Series.unstack:
print (df)
col
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1 = df['col'].str.split(expand=True)
g = df1.groupby(0).cumcount()
df2 = df1.set_index([0, g])[1].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
If 2 columns input data:
print (df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
g = df.groupby('col1').cumcount()
df2 = df.set_index(['col1', g])['col2'].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18

Pandas Group By two columns and based on the value in one of them (categorical) write data into a specific column [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have following dataframe:
df = pd.DataFrame([[1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],['A','B','B','B','C','D','D','E','A','C','C','C','A','B','B','B','B','D','E'], [18,25,47,27,31,55,13,19,73,55,58,14,2,46,33,35,24,60,7]]).T
df.columns = ['Brand_ID','Category','Price']
Brand_ID Category Price
0 1 A 18
1 1 B 25
2 1 B 47
3 1 B 27
4 1 C 31
5 1 D 55
6 1 D 13
7 1 E 19
8 2 A 73
9 2 C 55
10 2 C 58
11 2 C 14
12 3 A 2
13 3 B 46
14 3 B 33
15 3 B 35
16 3 B 24
17 3 D 60
18 3 E 7
What I need to do is to group by Brand_ID and category and count (similar to the first part of this question). However, I need instead to write the output into a different column depending on the category. So my Output should look like follows:
Brand_ID Category_A Category_B Category_C Category_D Category_E
0 1 1 3 1 2 1
1 2 1 0 3 0 0
2 3 1 4 0 1 1
Is there any possibility to do this directly with pandas?
Try:
df.groupby(['Brand_ID','Category'])['Price'].count()\
.unstack(fill_value=0)\
.add_prefix('Category_')\
.reset_index()\
.rename_axis([None], axis=1)
Output
Brand_ID Category_A Category_B Category_C Category_D Category_E
0 1 1 3 1 2 1
1 2 1 0 3 0 0
2 3 1 4 0 1 1
OR
pd.crosstab(df.Brand_ID, df.Category)\
.add_prefix('Category_')\
.reset_index()\
.rename_axis([None], axis=1)
You're describing a pivot_table:
df.pivot_table(index='Brand_ID', columns='Category', aggfunc='size', fill_value=0)
Output:
Category A B C D E
Brand_ID
1 1 3 1 2 1
2 1 0 3 0 0
3 1 4 0 1 1

kronecker product pandas dataframes

I have two dataframes
A B
0 1 2
1 1 2
2 1 2
and
C D
0 1 4
1 2 5
2 3 6
I need the mean of the cross products (AC, AD, BC, BD). As such I was hoping to be able to compute
AC AD BC BD
0 1 4 2 8
1 2 5 4 10
2 3 6 6 12
but so far I have been unable to do so. I tried multiply etc, but to no avail. I can do it using loops obviously, but is there an elegant way to do it?
Cheers, Mike
consider the dataframes d1 and d2
d1 = pd.DataFrame([[1, 2]] * 3, columns=list('AB'))
d2 = pd.DataFrame(np.arange(1, 7).reshape(2, 3).T, columns=list('CD'))
Then the kronecker product is
kp = pd.DataFrame(np.kron(d1, d2), columns=pd.MultiIndex.from_product([d1, d2]))
kp
NOTE
This is equivalent to flattening the outer products of each pair of columns. Not the cross products.
for python 3.7, given dataframes data1 and data2
def kronecker(data1:'Dataframe 1',data2:'Dataframe 2'):
Combination = pd.DataFrame(); d1 = pd.DataFrame()
for i in data2.columns:
d1 = data1.multiply(data2[i] , axis="index")
d1.columns = [f'{i}{j}' for j in data1.columns]
Combination = pd.concat([Combination, d1], axis = 1)
return Combination
To complement the answer of #piRSquared, if you want a partial Kronecker product like described in the question (along a single axis):
import numpy as np
pd.DataFrame(np.einsum('nk,nl->nkl', df1, df2).reshape(df1.shape[0], -1),
columns=pd.MultiIndex.from_product([df1, df2]).map(''.join)
)
output:
AC AD BC BD
0 1 4 2 8
1 2 5 4 10
2 3 6 6 12
In contrast, the other answer would give:
AC AD BC BD
0 1 4 2 8
1 2 5 4 10
2 3 6 6 12
3 1 4 2 8
4 2 5 4 10
5 3 6 6 12
6 1 4 2 8
7 2 5 4 10
8 3 6 6 12