Pandas Dataframe from a nested dictionary with list as values - pandas

I'm newer to python and pandas and I can't figure out a way to push this dict into a dataframe
a_dict = {'position': [{'points': '57.95', 'name': 'Def'}, {'points': '121', 'name': 'PK'}, {'points': '383.1', 'name': 'RB'}, {'points': '299.96', 'name': 'QB'}, {'points': '177.8', 'name': 'TE'}, {'points': '616.42', 'name': 'WR'}], 'id': 'MIN'}
I have tried multiple FOR loops to iterate through the dict but the list keeps me from organizing it. The data is originally in a JSON format. Thank you!

I'm guessing you want the points and names as columns
points = []
name = []
for dct in a_dict['position']:
points.append(dct['points'])
name.append(dct['name'])
pd.DataFrame({'points':points,'name':name})
With the output
points name
0 57.95 Def
1 121 PK
2 383.1 RB
3 299.96 QB
4 177.8 TE
5 616.42 WR

Related

python complex list object to dataframe

I wanted to create a dataframe by expanding the child list object along with parent objects.
Obviously trying pd.DataFrame(lst) does not work as it creates data frame with three columns only and keeps the child object as one column.
Is it possible to do this in one line instead of iterating through list to expand each child object? Thank you in advance.
I have a list object in python like this:
lst = [
{
'id': 'rec1',
'fields': {
'iso': 'US',
'name': 'U S',
'lat': '38.9051',
'lon': '-77.0162'
},
'createdTime': '2021-03-16T13:03:24.000Z'
},
{
'id': 'rec2',
'fields': {'iso': 'HK', 'name': 'China', 'lat': '0.0', 'lon': '0.0'},
'createdTime': '2021-03-16T13:03:24.000Z'
}
]
explected dataframe:
Use json_normalize:
df = pd.json_normalize(lst)
print (df)
id createdTime fields.iso fields.name fields.lat fields.lon
0 rec1 2021-03-16T13:03:24.000Z US U S 38.9051 -77.0162
1 rec2 2021-03-16T13:03:24.000Z HK China 0.0 0.0

Passing Dataframe column as variable to a dict in a loop

I have a Dataframe with 2 columns as below:
name, type
prod_a, fruit
prod_b, vegetable
prod_c, fruit
I am trying to pass these two columns to the below dict in a loop:
data = {"name": df['name'],
"accountId": df['type']}
How could I pass values from the Dataframe into the above dict data
If want loop by each row and create dictionaries separately use:
for x, y in df[['name','type']].values:
data = {"name": x, "accountId": y}
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
Or rename column and use DataFrame.to_dict with r for method records:
for data in df[['name','type']].rename(columns={'type':'accountId'}).to_dict('r'):
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
If need same output use DataFrame.to_dict with l for method list:
data = df[['name','type']].rename(columns={'type':'accountId'}).to_dict('l')
print (data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}
IIUC:
df = pd.DataFrame({
'name': ['prod_a', 'prod_b', 'prod_c'],
'type': ['fruit', 'vegetable', 'fruit']
})
data = dict()
for i in list(df.columns):
data.update({('accountId' if i=='type' else i): list(df[i])})
print(data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}

Collapsing a PANDAs dataframe into a single column of all items and their occurances

I have a data frame consisting of a mixture of NaN's and strings e.g
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
ddf = pd.DataFrame(data)
I want to
1:count the total number of items and put in a new data frame e.g
NaN=2
tree=5
car=2
fish=1
cat=1
dog=1
2:Count the total number of items when compared to a separate longer list (column of a another data frame, e.g
df['compare'] =
NaN
tree
car
fish
cat
dog
rabbit
Pear
Orange
snow
rain
Thanks
Jason
For the first question:
from collections import Counter
data = {
"String1": ["NaN", "tree", "car", "tree"],
"String2": ["cat", "dog", "car", "tree"],
"String3": ["fish", "tree", "NaN", "tree"],
}
ddf = pd.DataFrame(data)
a = Counter(ddf.stack().tolist())
df_result = pd.DataFrame(dict(a), index=['Count']).T
df = pd.DataFrame({'vals':['NaN', 'tree', 'car', 'fish', 'cat', 'dog', 'rabbit', 'Pear', 'Orange', 'snow', 'rain']})
df_counts = df.vals.map(df_result.to_dict()['Count'])
THis should do :)
You can use the following code for count of items over all data frame.
import pandas as pd
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
df = pd.DataFrame(data)
def get_counts(df: pd.DataFrame) -> dict:
res = {}
for col in df.columns:
vc = df[col].value_counts().to_dict()
for k,v in vc.items():
if k in res:
res[k] += v
else:
res[k] = v
return res
counts = get_counts(df)
Output
>>> print(counts)
{'tree': 5, 'car': 2, 'NaN': 2, 'cat': 1, 'dog': 1, 'fish': 1}

Pandas - Filtering out column based on value

I have a Pandas Dataframe that two columns as below (view with header):
name,attribute
abc,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000xyz'}, 'Name': 'Product 1'}
def,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000abc'}, 'Name': 'Product 2'}
klm,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000abc'}, 'Name': 'Product 2'}
How could I filter out rows that have attribute like 'Product 1'
Could anyone assist, thanks
Use list comprehension with get for working with rows also if not exist key Name in some row for boolean mask and filter by boolean indexing:
df = df[[x.get('Name') == 'Product 1' for x in df['attribute']]]
Or:
df = df[df['attribute'].apply(lambda x: x.get('Name')) == 'Product 1']
#alternative, working if all Name exist in each row
#df = df[df['attribute'].apply(lambda x: x['Name']) == 'Product 1']
print (df)
name attribute
0 abc {'attributes': {'type': 'RecordType', 'url': '...
EDIT:
If want also filter by nested dictionaries:
df = df[[x.get('attributes').get('type') == 'RecordType' for x in df['attribute']]]

missing data in pandas profiling report

I am using Python 2.7 and Pandas Profiling to generate a report out of a dataframe. Following is my code:
import pandas as pd
import pandas_profiling
# the actual dataset is very large, just providing the two elements of the list
data = [{'polarity': 0.0, 'name': u'danesh bhopi', 'sentiment': 'Neutral', 'tweet_id': 1049952424818020353, 'original_tweet_id': 1049952424818020353, 'created_at': Timestamp('2018-10-10 14:18:59'), 'tweet_text': u"Wouldn't mind aus 120 all-out but before that would like to see a Finch \U0001f4af #PakVAus #AUSvPAK", 'source': u'Twitter for Android', 'location': u'pune', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'DaneshBhope'}, {'polarity': 1.0, 'name': u'kamal Kishor parihar', 'sentiment': 'Positive', 'tweet_id': 1049952403980775425, 'original_tweet_id': 1049952403980775425, 'created_at': Timestamp('2018-10-10 14:18:54'), 'tweet_text': u'#the_summer_game What you and Australia think\nPlay for\n win \nDraw\n or....! #PakvAus', 'source': u'Twitter for Android', 'location': u'chembur Mumbai ', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'kaluparihar1'}]
df = pd.DataFrame(data) #data is a python list containing python dictionaries
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("df_report.html")
The screenshot of the part of the df_report.html file is below:
As you can see in the image, the Unique(%) field in all the variables is 0.0 although the columns have unique values.
Apart from this, the chart in the 'location' variable is broken. There is no bar for the values 22, 15, 4 and the only bar is for the maximum value only. This is happening in all the variables.
Any help would be appreciated.