python complex list object to dataframe - pandas

I wanted to create a dataframe by expanding the child list object along with parent objects.
Obviously trying pd.DataFrame(lst) does not work as it creates data frame with three columns only and keeps the child object as one column.
Is it possible to do this in one line instead of iterating through list to expand each child object? Thank you in advance.
I have a list object in python like this:
lst = [
{
'id': 'rec1',
'fields': {
'iso': 'US',
'name': 'U S',
'lat': '38.9051',
'lon': '-77.0162'
},
'createdTime': '2021-03-16T13:03:24.000Z'
},
{
'id': 'rec2',
'fields': {'iso': 'HK', 'name': 'China', 'lat': '0.0', 'lon': '0.0'},
'createdTime': '2021-03-16T13:03:24.000Z'
}
]
explected dataframe:

Use json_normalize:
df = pd.json_normalize(lst)
print (df)
id createdTime fields.iso fields.name fields.lat fields.lon
0 rec1 2021-03-16T13:03:24.000Z US U S 38.9051 -77.0162
1 rec2 2021-03-16T13:03:24.000Z HK China 0.0 0.0

Related

Pandas boolean condition from nested list of dictionaries

[{'id': 123,
'type': 'salary', #Parent node
'tx': 'house',
'sector': 'EU',
'transition': [{'id': 'hash', #Child node
'id': 123,
'type': 'salary',
'tx': 'house' }]},
{'userid': 123,
'type': 'salary', #Parent node
'tx': 'office',
'transition': [{'id': 'hash', # Child node
'id': 123,
'type': 'salary',
'tx': 'office'}]}]
As a pandas column ('info') I have some information stored as a nested list of dictionaries like the example above.
What I'm trying to do is a boolean condition whether this list has the following attributes:
More than one 'type' == 'salary' in any of all parents nodes
Field 'tx' is different in any of all parents nodes with 'type' == 'salary'
So far I've tried to flatten a list and filter but it is not solving the first and seconds nodes
a = df.iloc[0].info
values = [item for sublist in [[list(i.values()) for i in a]][0]for item in sublist]
If you want to one line solution, you can use:
df['check'] = df['info'].apply(lambda x: True if sum([1 if i['type']=='salary' else 0 for i in x]) > 1 and [i['tx'] for i in x if i['type']=='salary'].count([i['tx'] for i in x if i['type']=='salary'][0]) != len([i['tx'] for i in x if i['type']=='salary']) else False)
or (expanded):
def check(x):
total_salary = sum([1 if i['type']=='salary' else 0 for i in x]) # get count of "type": "salary" matches
tx_list = [i['tx'] for i in x if i['type']=='salary'] # get tx values when type==salary
tx_check = tx_list.count(tx_list[0]) != len(tx_list) # check all values are same in tx_list
if total_salary > 1 and tx_check:
return True
else:
return False
df['check'] = df['info'].apply(check)

Conditional mapping among columns of two data frames with Pandas Data frame

I needed your advice regarding how to map columns between data-frames:
I have put it in simple way so that it's easier for you to understand:
df = dataframe
EXAMPLE:
df1 = pd.DataFrame({
"X": [],
"Y": [],
"Z": []
})
df2 = pd.DataFrame({
"A": ['', '', 'A1'],
"C": ['', '', 'C1'],
"D": ['D1', 'Other', 'D3'],
"F": ['', '', ''],
"G": ['G1', '', 'G3'],
"H": ['H1', 'H2', 'H3']
})
Requirement:
1st step:
We needed to track a value for X column on df1 from columns A, C, D respectively. It would stop searching once it finds any value and would select it.
2nd step:
If the selected value is "Other" then X column of df1 would map columns F, G, and H respectively until it finds any value.
Result:
X
0 D1
1 H2
2 A1
Thank you so much in advance
Try this:
def first_non_empty(df, cols):
"""Return the first non-empty, non-null value among the specified columns per row"""
return df[cols].replace('', pd.NA).bfill(axis=1).iloc[:, 0]
col_x = first_non_empty(df2, ['A','C','D'])
col_x = col_x.mask(col_x == 'Other', first_non_empty(df2, ['F','G','H']))
df1['X'] = col_x

Pandas Dataframe from a nested dictionary with list as values

I'm newer to python and pandas and I can't figure out a way to push this dict into a dataframe
a_dict = {'position': [{'points': '57.95', 'name': 'Def'}, {'points': '121', 'name': 'PK'}, {'points': '383.1', 'name': 'RB'}, {'points': '299.96', 'name': 'QB'}, {'points': '177.8', 'name': 'TE'}, {'points': '616.42', 'name': 'WR'}], 'id': 'MIN'}
I have tried multiple FOR loops to iterate through the dict but the list keeps me from organizing it. The data is originally in a JSON format. Thank you!
I'm guessing you want the points and names as columns
points = []
name = []
for dct in a_dict['position']:
points.append(dct['points'])
name.append(dct['name'])
pd.DataFrame({'points':points,'name':name})
With the output
points name
0 57.95 Def
1 121 PK
2 383.1 RB
3 299.96 QB
4 177.8 TE
5 616.42 WR

Passing Dataframe column as variable to a dict in a loop

I have a Dataframe with 2 columns as below:
name, type
prod_a, fruit
prod_b, vegetable
prod_c, fruit
I am trying to pass these two columns to the below dict in a loop:
data = {"name": df['name'],
"accountId": df['type']}
How could I pass values from the Dataframe into the above dict data
If want loop by each row and create dictionaries separately use:
for x, y in df[['name','type']].values:
data = {"name": x, "accountId": y}
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
Or rename column and use DataFrame.to_dict with r for method records:
for data in df[['name','type']].rename(columns={'type':'accountId'}).to_dict('r'):
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
If need same output use DataFrame.to_dict with l for method list:
data = df[['name','type']].rename(columns={'type':'accountId'}).to_dict('l')
print (data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}
IIUC:
df = pd.DataFrame({
'name': ['prod_a', 'prod_b', 'prod_c'],
'type': ['fruit', 'vegetable', 'fruit']
})
data = dict()
for i in list(df.columns):
data.update({('accountId' if i=='type' else i): list(df[i])})
print(data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}

Collapsing a PANDAs dataframe into a single column of all items and their occurances

I have a data frame consisting of a mixture of NaN's and strings e.g
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
ddf = pd.DataFrame(data)
I want to
1:count the total number of items and put in a new data frame e.g
NaN=2
tree=5
car=2
fish=1
cat=1
dog=1
2:Count the total number of items when compared to a separate longer list (column of a another data frame, e.g
df['compare'] =
NaN
tree
car
fish
cat
dog
rabbit
Pear
Orange
snow
rain
Thanks
Jason
For the first question:
from collections import Counter
data = {
"String1": ["NaN", "tree", "car", "tree"],
"String2": ["cat", "dog", "car", "tree"],
"String3": ["fish", "tree", "NaN", "tree"],
}
ddf = pd.DataFrame(data)
a = Counter(ddf.stack().tolist())
df_result = pd.DataFrame(dict(a), index=['Count']).T
df = pd.DataFrame({'vals':['NaN', 'tree', 'car', 'fish', 'cat', 'dog', 'rabbit', 'Pear', 'Orange', 'snow', 'rain']})
df_counts = df.vals.map(df_result.to_dict()['Count'])
THis should do :)
You can use the following code for count of items over all data frame.
import pandas as pd
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
df = pd.DataFrame(data)
def get_counts(df: pd.DataFrame) -> dict:
res = {}
for col in df.columns:
vc = df[col].value_counts().to_dict()
for k,v in vc.items():
if k in res:
res[k] += v
else:
res[k] = v
return res
counts = get_counts(df)
Output
>>> print(counts)
{'tree': 5, 'car': 2, 'NaN': 2, 'cat': 1, 'dog': 1, 'fish': 1}