How to modify groups of a grouped pandas dataframe

How to modify groups of a grouped pandas dataframe - pandas

I have this dataframe:
s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})
that is like this:
A B C
0 1 a A
1 1 b B
2 1 c C
3 2 d D
4 2 e E
5 2 f F
6 2 g G
I want to do a groupby like this:
groups = s.groupby("A")
for example, the group 2 is:
g2 = groups.get_group("2")
that looks like this:
A B C
3 2 d D
4 2 e E
5 2 f F
6 2 g G
Anyway, I want to do some operation in each group.
Let me show how my final result should be:
A B C D
1 1 b B a=b;A=B
2 1 c C a=c;A=C
4 2 e E d=e;D=E
5 2 f F d=f;F=F
6 2 g G d=g;D=G
Actually, I am dropping the first row in each group but combining it with the other rows of the group to create column C
Any idea how to do this?
Summary of what I want to do in two lines:
I want to do a group by and in each group, I want to drop the first row. I also want to add a column to the whole dataframe that is based on the rows of the group
What I have tried:
In order to solve this, I am going to create a function:
def func(g):
first_row_of_group = g.iloc[0]
g = g.iloc[1:]
g["C"] = g.apply(lambda row: ";".join([f'{a}={b}' for a, b in zip(row, first_row_of_group)]))
return g
Then I am going to do this:
groups.apply(lambda g: func(g))

You can apply a custom function to each group where you add the elements from the first row to the remaining rows and remove it:
def remove_first(x):
first = x.iloc[0]
x = x.iloc[1:]
x['D'] = first['B'] + '=' + x['B'] + ';' + first['C'] + '=' + x['C']
# an equivalent operation
# x['D'] = first.iloc[1] + '=' + x.iloc[:,1] + ';' + first.iloc[2] + '=' + x.iloc[:,2]
return x
s = s.groupby('A').apply(remove_first).droplevel(0)
Output:
A B C D
1 1 b B a=b;A=B
2 1 c C a=c;A=C
4 2 e E d=e;D=E
5 2 f F d=f;D=F
6 2 g G d=g;D=G
Note: The dataframe shown in your question is constructed from
s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})
but you give a different one as raw input.

Related

Map Dictionary of Lists to pd Dataframe Column and Create Repeat Rows Based on n Number of List Contents

I am trying to use the following two components 1) a dictionary of lists and 2) a dataframe column composed of the dictionary keys. I would like to to map n number of values to their corresponding key in the existing pandas column, and create duplicate rows based on the number of list contents. I would like to maintain this as a df and not convert to series.
ex. dictionary
d = {a:['i','ii'],b:['iii','iv'],c:['v','vi','vii']}
ex. dataframe columns
Column1 Column2
0 g a
1 h b
2 i c
desired output:
Column1 Column2 Column3
0 g a i
1 g a ii
2 h b iii
3 h b iv
4 i c v
5 i c vi
6 i c vii
What if another dictionary had to be mapped similarly to these three columns from the output? Say, with the following dictionary:
d2 = {'i':['A'],'ii':['B'],'iii':['C','D'],'iv':['E'],'v':['F'];'vi':[G];'vii':['H','I','J']}
What if the dictionary was in df format?
Any help would be much appreciated! Thank you!

use map to create a new column and then explode the list into rows
df=df.assign(Column3 = df['Column2'].map( d))
df.explode('Column3')
Column1 Column2 Column3
0 g a i
0 g a ii
1 h b iii
1 h b iv
2 i c v
2 i c vi
2 i c vii
follow the same to map to Column 3
df=df.assign(Column4 = df['Column3'].map( d2))
df=df.explode('Column4')
df
Column1 Column2 Column3 Column4
0 g a i A
0 g a ii B
1 h b iii C
1 h b iii D
1 h b iv E
2 i c v F
2 i c vi G
2 i c vii H
2 i c vii I
2 i c vii J

How to Create a network graph based a simple Datafrme

I am wondering how I can create an Edge list (from, to) based on this type of data. Both columns are inside a pandas data frame and the type is string.
Name
Co-Workers
A
A,B,C,D
B
A,B,C,D
C
A,B,C,E
D
A,B,D,E
E
C,D,E
And also I want to remove connections like AA BB CC ,....

IIUC, you can explode your data and filter it:
df2 = df.copy()
df2['Co-Workers'] = df['Co-Workers'].str.split(',')
df2 = df2.explode('Co-Workers')
df2[df2['Name'].ne(df2['Co-Workers'])]
output:
Name Co-Workers
0 A B
0 A C
0 A D
1 B A
1 B C
1 B D
2 C A
2 C B
2 C E
3 D A
3 D B
3 D E
4 E C
4 E D

First split the column from string to list of separate values.
Second, explode the column.
Third, create a directional graph.
Process the data by mozway code
And then:
from matplotlib.pyplot import figure
G = nx.from_pandas_edgelist(df2, source='Name', target='Co-Workers')
figure(figsize=(10, 8))
nx_graph = nx.compose(nx.DiGraph(), G)
nx.draw_shell(nx_graph, with_labels=True)
Result graph:

Split Column into Unknown Number of Columns by Delimiter in Pandas Dataframe

I have this table with strings delimiter "+"
ID Products
1 A + B + C + D + E ...
2 A + F + G
3 X + D
I would like to return in this format
ID Products Product 1 Product 2 Product 3 Product 4 Product 5 product...
1 A + B + C + D + E ... A B C D E ...
2 A + F + G A F G
3 X + D X D
1 D + C + C + D + E D C C D E
How I can reproduce this in Pandas Dataframe?

Use Series.str.split with regex '\s+\+\s+' - it means one or more whitesapces, escaped +, one or more whitespaces, then change columns names by DataFrame.add_prefix and last add to original by DataFrame.join:
df1 = df['Products'].str.split('\s+\+\s+', expand=True).add_prefix('Product').fillna('')
df = df.join(df1)
print (df)
ID Products Product0 Product1 Product2 Product3 Product4
0 1 A + B + C + D + E A B C D E
1 2 A + F + G A F G
2 3 X + D X D
Also if necessary change column names:
d = lambda x: f'Product{x+1}'
df = (df.join(df['Products'].str.split('\s+\+\s+', expand=True)
.rename(columns=d)
.fillna('')))
print (df)
ID Products Product1 Product2 Product3 Product4 Product5
0 1 A + B + C + D + E A B C D E
1 2 A + F + G A F G
2 3 X + D X D

Append two pandas dataframe with different shapes and in for loop using python or pandasql

I have two dataframe such as:
df1:
id A B C D
1 a b c d
1 e f g h
1 i j k l
df2:
id A C D
2 x y z
2 u v w
The final outcome should be:
id A B C D
1 a b c d
1 e f g h
1 i j k l
2 x y z
2 u v w
These tables are generated using for loop from json files. So have to keep on appending these tables one below another.
Note: Two dataframes 'id' column is always different.
My approach:
data is a dataframe in which column 'X' has json data and has and "id" column also.
df1=pd.DataFrame()
for i, row1 in data.head(2).iterrows():
df2= pd.io.json.json_normalize(row1["X"])
df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
df2["id"]=[row1["id"] for i in range(df2.shape[0])]
if len(df1)==0:
df1=df2.copy()
df1=pd.concat((df1,df2), ignore_index=True)
Error: AssertionError: Number of manager items must equal union of block items # manager items: 46, # tot_items: 49
How to solve this using python or pandas sql.

You can use pd.concat to concatenate two dataframes like
>>> pd.concat((df,df1), ignore_index=True)
id A B C D
0 1 a b c d
1 1 e f g h
2 1 i j k l
3 2 x NaN y z
4 2 u NaN v w

How to aggregate a column by a value on another column?

Suppose I have the following df.
df = pd.DataFrame({
'A':['x','y','x','y'],
'B':['a','b','a','b'],
'C':[1,10,100,1000],
'D':['w','v','v','w']
})
A B C D
0 x a 1 w
1 y b 10 v
2 x a 100 v
3 y b 1000 w
I want to group by columns A and B, sum column C, and keep the value from D which is the same row of the maximum group value of C. Like this:
A B C D
x a 101 v
y b 1010 w
So far, I have this:
df.groupby(['A','B']).agg({'C':sum})
A B C
x a 101
y b 1010
What function do I have to aggregate column D with?

You can use DataFrameGroupBy.idxmax for indices of max values of C with loc:
#unique index
df.reset_index(drop=True, inplace=True)
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax'])
df1['idxmax'] = df.loc[df1['idxmax'], 'D'].values
df1 = df1.rename(columns={'idxmax':'D','sum':'C'}).reset_index()
Similar solution with map:
df1 = df.groupby(['A','B'])['C'].agg(['sum', 'idxmax']).reset_index()
df1['idxmax'] = df1['idxmax'].map(df['D'])
df1 = df1.rename(columns={'idxmax':'D','sum':'C'})
print (df1)
A B C D
0 x a 101 v
1 y b 1010 w

set_index before you group by
df.set_index('D').groupby(['A','B']).C.agg(['sum','idxmax']).\
reset_index().rename(columns={'idxmax':'D','sum':'C'})
Out[407]:
A B C D
0 x a 101 v
1 y b 1010 w

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to modify groups of a grouped pandas dataframe - pandas

Related

Map Dictionary of Lists to pd Dataframe Column and Create Repeat Rows Based on n Number of List Contents

How to Create a network graph based a simple Datafrme

Split Column into Unknown Number of Columns by Delimiter in Pandas Dataframe

Append two pandas dataframe with different shapes and in for loop using python or pandasql

How to aggregate a column by a value on another column?

Categories

Resources