Construct DataFrame from list of dicts

Construct DataFrame from list of dicts - pandas

Trying to construct pandas DataFrame from list of dicts
List of dicts:
a = [{'1': 'A'},
{'2': 'B'},
{'3': 'C'}]
Pass list of dicts into pd.DataFrame():
df = pd.DataFrame(a)
Actual results:
1 2 3
0 A NaN NaN
1 NaN B NaN
2 NaN NaN C
pd.DataFrame(a, columns=['Key', 'Value'])
Actual results:
Key Value
0 NaN NaN
1 NaN NaN
2 NaN NaN
Expected results:
Key Value
0 1 A
1 2 B
2 3 C

try this,
from collections import ChainMap
data = dict(ChainMap(*a))
pd.DataFrame(data.items(), columns= ['Key','Value'])
O/P:
Key Value
0 1 A
1 2 B
2 3 C

Something like this with a list comprehension:
pd.DataFrame(([(x, y) for i in a for x, y in i.items()]),columns=['Key','Value'])
Key Value
0 1 A
1 2 B
2 3 C

Related

How to split dictionary column in dataframe and make a new columns for each key values

I have a dataframe which has a column containing multiple values, separated by ",".
id data
0 {'1':A, '2':B, '3':C}
1 {'1':A}
2 {'0':0}
How can I split up the keys-values of 'data' column and make a new column for each key values present in it, without removing the original 'data' column.
desired output.
id data 1 2 3 0
0 {'1':A, '2':B, '3':C} A B C Nan
1 {'1':A} A Nan Nan Nan
2 {'0':0} Nan Nan Nan 0
Thank you in advance :).

You'll need a regular expression to convert the data into a format that can be parsed as JSON. Then, pd.json_normalize will do the job nicely:
df['data'] = df['data'].str.replace(r'(["\'])\s*:(.+?)\s*(,?\s*["\'}])', '\\1:\'\\2\'\\3', regex=True)
import ast
df['data'] = df['data'].apply(ast.literal_eval)
df = pd.concat([df, pd.json_normalize(df['data'])], axis=1)
Output:
>>> df
data 1 2 3 0
0 {'1': 'A', '2': 'B', '3': 'C'} A B C NaN
1 {'1': 'A'} A NaN NaN NaN
2 {'0': '0'} NaN NaN NaN 0

regroup uneven number of rows pandas df

I need to regroup a df from the above format in the one below but it fails and the output shape is (unique number of IDs, 2). Is there a more obvious solution?

You can use groupby and pivot:
(df.assign(n=df.groupby('ID').cumcount().add(1))
.pivot(index='ID', columns='n', values='Value')
.add_prefix('val_')
.reset_index()
)
Example input:
df = pd.DataFrame({'ID': [7,7,8,11,12,18,22,22,22],
'Value': list('abcdefghi')})
Output:
n ID val_1 val_2 val_3
0 7 a b NaN
1 8 c NaN NaN
2 11 d NaN NaN
3 12 e NaN NaN
4 18 f NaN NaN
5 22 g h i

How to select the rows having same id and have all missing value in another column

I have the following dataframe:
ID col_1
1 NaN
2 NaN
3 4.0
2 NaN
2 NaN
3 NaN
3 3.0
1 NaN
I need the following output:
ID col_1
1 NaN
1 NaN
2 NaN
2 NaN
2 NaN
how to do this in pandas

You can create a boolean mask with isna then group this mask by ID and transform using all, then you can filter the rows with the help of this mask:
mask = df['col_1'].isna().groupby(df['ID']).transform('all')
df[mask].sort_values('ID')
Alternatively you can use groupby + filter to filter out the groups which satisfy the condition where all values in col_1 are NaN but this method should be slower than the above:
df.groupby('ID').filter(lambda g: g['col_1'].isna().all()).sort_values('ID')
ID col_1
0 1 NaN
7 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN

Let us try with isin after groupby with all
s = df['col_1'].isna().groupby(df['ID']).all()
df = df.loc[df.ID.isin(s[s].index.tolist())]
df
Out[73]:
ID col_1
0 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN
7 1 NaN

import pandas as pd
import numpy as np
df=pd.read_excel(r"D:\Stack_overflow\test12.xlsx")
df1=(df[df['cols_1'].isnull()]).sort_values(by=['ID'])
I think we can simply take out the null values.

How do I make the pandas index of a pivot table part of the column names?

I'm trying to pivot two columns out by another flag column with out multi-indexing. I would like to have the column names be a part of the indicator itself. Take for example:
import pandas as pd
df_dict = {'fire_indicator':[0,0,1,0,1],
'cost':[200, 300, 354, 456, 444],
'value':[1,1,2,1,1],
'id':['a','b','c','d','e']}
df = pd.DataFrame(df_dict)
If I do the following:
df.pivot_table(index = 'id', columns = 'fire_indicator', values = ['cost','value'])
I get the following:
cost value
fire_indicator 0 1 0 1
id
a 200.0 NaN 1.0 NaN
b 300.0 NaN 1.0 NaN
c NaN 354.0 NaN 2.0
d 456.0 NaN 1.0 NaN
e NaN 444.0 NaN 1.0
What I'm trying to do is the following:
id fire_indicator_0_cost fire_indicator_1_cost fire_indicator_0_value fire_indicator_0_value
a 200 0 1 0
b 300 0 1 0
c 0 354 0 2
d 456 0 1 0
e 0 444 0 1
I know there is a way in SAS. Is there a way in python pandas?

Just rename and re_index:
out = df.pivot_table(index = 'id', columns = 'fire_indicator', values = ['cost','value'])
out.columns = [f'fire_indicator_{y}_{x}' for x,y in out.columns]
# not necessary if you want `id` be the index
out = out.reset_index()
Output:
id fire_indicator_0_cost fire_indicator_1_cost fire_indicator_0_value fire_indicator_1_value
-- ---- ----------------------- ----------------------- ------------------------ ------------------------
0 a 200 nan 1 nan
1 b 300 nan 1 nan
2 c nan 354 nan 2
3 d 456 nan 1 nan
4 e nan 444 nan 1

How to do pd.fillna() with condition

Am trying to do a fillna with if condition
Fimport pandas as pd
df = pd.DataFrame(data={'a':[1,None,3,None],'b':[4,None,None,None]})
print df
df[b].fillna(value=0, inplace=True) only if df[a] is None
print df
a b
0 1 4
1 NaN NaN
2 3 NaN
3 NaN NaN
##What i want to acheive
a b
0 1 4
1 NaN 0
2 3 NaN
3 NaN 0
Please help

You can chain both conditions for test mising values with & for bitwise AND and then replace values to 0:
df.loc[df.a.isna() & df.b.isna(), 'b'] = 0
#alternative
df.loc[df[['a', 'b']].isna().all(axis=1), 'b'] = 0
print (df)
a b
0 1.0 4.0
1 NaN 0.0
2 3.0 NaN
3 NaN 0.0
Or you can use fillna with one condition:
df.loc[df.a.isna(), 'b'] = df.b.fillna(0)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Construct DataFrame from list of dicts - pandas

try this, from collections import ChainMap data = dict(ChainMap(*a)) pd.DataFrame(data.items(), columns= ['Key','Value']) O/P: Key Value 0 1 A 1 2 B 2 3 C

Something like this with a list comprehension: pd.DataFrame(([(x, y) for i in a for x, y in i.items()]),columns=['Key','Value']) Key Value 0 1 A 1 2 B 2 3 C

Related

How to split dictionary column in dataframe and make a new columns for each key values

regroup uneven number of rows pandas df

How to select the rows having same id and have all missing value in another column

How do I make the pandas index of a pivot table part of the column names?

How to do pd.fillna() with condition

Categories

Resources