One to one mapping of two lists and form dataframe - pandas

I've list of elements in two lists:
list1 = [['a','b','c'],['a','d'],['a','f','c']]
list2 = [['S1','S2','S3'],['S1','S3'],['S1','S2','S3']]
I want to map these two lists in a data-frame, with a specific way that if the value is present in list 1, it should make an entry into the data-frame otherwise print null or 0
S1 S2 S3
a b c
a 0 d
a f c

Just concatenate with list comprehension:
(pd.concat([pd.DataFrame([a], columns=b)
for a,b in zip(list1,list2)],
ignore_index=True)
.fillna(0)
)
Output:
S1 S2 S3
0 a b c
1 a 0 d
2 a f c

Zip the lists, then zip the elements of the lists.
pd.DataFrame.from_records([dict(zip(*z)) for z in zip(list2, list1)]).fillna(0)
S1 S2 S3
0 a b c
1 a 0 d
2 a f c

Related

Multimatch join in pandas

I am looking for joining two data frame on one column and if there is a multi match then append the results to another column.
NB. using a different example as yours is not reproducible.
You can convert to str.lower, then explode and map the values to groupby.agg again as string:
mapper = df2.set_index('name')['ID'].astype(str)
df1['ID'] = (df1['name']
.str.upper().str.split(',')
.explode()
.map(mapper)
.groupby(level=0).agg(','.join)
)
Or, with a list comprehension:
mapper = df2.set_index('name')['ID'].astype(str)
df1['ID'] = [','.join([mapper[x] for x in s.split(',') if x in mapper])
for s in df1['name']]
output:
name ID
0 A 1
1 b 2
2 A,B 1,2
3 C,a 3,1
4 D 4
Used input:
# df1
name
0 A
1 b
2 A,B
3 C,a
4 D
# df2
name ID
0 A 1
1 B 2
2 C 3
3 D 4

Map Dictionary of Lists to pd Dataframe Column and Create Repeat Rows Based on n Number of List Contents

I am trying to use the following two components 1) a dictionary of lists and 2) a dataframe column composed of the dictionary keys. I would like to to map n number of values to their corresponding key in the existing pandas column, and create duplicate rows based on the number of list contents. I would like to maintain this as a df and not convert to series.
ex. dictionary
d = {a:['i','ii'],b:['iii','iv'],c:['v','vi','vii']}
ex. dataframe columns
Column1 Column2
0 g a
1 h b
2 i c
desired output:
Column1 Column2 Column3
0 g a i
1 g a ii
2 h b iii
3 h b iv
4 i c v
5 i c vi
6 i c vii
What if another dictionary had to be mapped similarly to these three columns from the output? Say, with the following dictionary:
d2 = {'i':['A'],'ii':['B'],'iii':['C','D'],'iv':['E'],'v':['F'];'vi':[G];'vii':['H','I','J']}
What if the dictionary was in df format?
Any help would be much appreciated! Thank you!
use map to create a new column and then explode the list into rows
df=df.assign(Column3 = df['Column2'].map( d))
df.explode('Column3')
Column1 Column2 Column3
0 g a i
0 g a ii
1 h b iii
1 h b iv
2 i c v
2 i c vi
2 i c vii
follow the same to map to Column 3
df=df.assign(Column4 = df['Column3'].map( d2))
df=df.explode('Column4')
df
Column1 Column2 Column3 Column4
0 g a i A
0 g a ii B
1 h b iii C
1 h b iii D
1 h b iv E
2 i c v F
2 i c vi G
2 i c vii H
2 i c vii I
2 i c vii J

How to modify groups of a grouped pandas dataframe

I have this dataframe:
s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})
that is like this:
A B C
0 1 a A
1 1 b B
2 1 c C
3 2 d D
4 2 e E
5 2 f F
6 2 g G
I want to do a groupby like this:
groups = s.groupby("A")
for example, the group 2 is:
g2 = groups.get_group("2")
that looks like this:
A B C
3 2 d D
4 2 e E
5 2 f F
6 2 g G
Anyway, I want to do some operation in each group.
Let me show how my final result should be:
A B C D
1 1 b B a=b;A=B
2 1 c C a=c;A=C
4 2 e E d=e;D=E
5 2 f F d=f;F=F
6 2 g G d=g;D=G
Actually, I am dropping the first row in each group but combining it with the other rows of the group to create column C
Any idea how to do this?
Summary of what I want to do in two lines:
I want to do a group by and in each group, I want to drop the first row. I also want to add a column to the whole dataframe that is based on the rows of the group
What I have tried:
In order to solve this, I am going to create a function:
def func(g):
first_row_of_group = g.iloc[0]
g = g.iloc[1:]
g["C"] = g.apply(lambda row: ";".join([f'{a}={b}' for a, b in zip(row, first_row_of_group)]))
return g
Then I am going to do this:
groups.apply(lambda g: func(g))
You can apply a custom function to each group where you add the elements from the first row to the remaining rows and remove it:
def remove_first(x):
first = x.iloc[0]
x = x.iloc[1:]
x['D'] = first['B'] + '=' + x['B'] + ';' + first['C'] + '=' + x['C']
# an equivalent operation
# x['D'] = first.iloc[1] + '=' + x.iloc[:,1] + ';' + first.iloc[2] + '=' + x.iloc[:,2]
return x
s = s.groupby('A').apply(remove_first).droplevel(0)
Output:
A B C D
1 1 b B a=b;A=B
2 1 c C a=c;A=C
4 2 e E d=e;D=E
5 2 f F d=f;D=F
6 2 g G d=g;D=G
Note: The dataframe shown in your question is constructed from
s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})
but you give a different one as raw input.

How to Create a network graph based a simple Datafrme

I am wondering how I can create an Edge list (from, to) based on this type of data. Both columns are inside a pandas data frame and the type is string.
Name
Co-Workers
A
A,B,C,D
B
A,B,C,D
C
A,B,C,E
D
A,B,D,E
E
C,D,E
And also I want to remove connections like AA BB CC ,....
IIUC, you can explode your data and filter it:
df2 = df.copy()
df2['Co-Workers'] = df['Co-Workers'].str.split(',')
df2 = df2.explode('Co-Workers')
df2[df2['Name'].ne(df2['Co-Workers'])]
output:
Name Co-Workers
0 A B
0 A C
0 A D
1 B A
1 B C
1 B D
2 C A
2 C B
2 C E
3 D A
3 D B
3 D E
4 E C
4 E D
First split the column from string to list of separate values.
Second, explode the column.
Third, create a directional graph.
Process the data by mozway code
And then:
from matplotlib.pyplot import figure
G = nx.from_pandas_edgelist(df2, source='Name', target='Co-Workers')
figure(figsize=(10, 8))
nx_graph = nx.compose(nx.DiGraph(), G)
nx.draw_shell(nx_graph, with_labels=True)
Result graph:

Append two pandas dataframe with different shapes and in for loop using python or pandasql

I have two dataframe such as:
df1:
id A B C D
1 a b c d
1 e f g h
1 i j k l
df2:
id A C D
2 x y z
2 u v w
The final outcome should be:
id A B C D
1 a b c d
1 e f g h
1 i j k l
2 x y z
2 u v w
These tables are generated using for loop from json files. So have to keep on appending these tables one below another.
Note: Two dataframes 'id' column is always different.
My approach:
data is a dataframe in which column 'X' has json data and has and "id" column also.
df1=pd.DataFrame()
for i, row1 in data.head(2).iterrows():
df2= pd.io.json.json_normalize(row1["X"])
df2.columns = df2.columns.map(lambda x: x.split(".")[-1])
df2["id"]=[row1["id"] for i in range(df2.shape[0])]
if len(df1)==0:
df1=df2.copy()
df1=pd.concat((df1,df2), ignore_index=True)
Error: AssertionError: Number of manager items must equal union of block items # manager items: 46, # tot_items: 49
How to solve this using python or pandas sql.
You can use pd.concat to concatenate two dataframes like
>>> pd.concat((df,df1), ignore_index=True)
id A B C D
0 1 a b c d
1 1 e f g h
2 1 i j k l
3 2 x NaN y z
4 2 u NaN v w