Passing Dataframe column as variable to a dict in a loop - pandas

I have a Dataframe with 2 columns as below:
name, type
prod_a, fruit
prod_b, vegetable
prod_c, fruit
I am trying to pass these two columns to the below dict in a loop:
data = {"name": df['name'],
"accountId": df['type']}
How could I pass values from the Dataframe into the above dict data

If want loop by each row and create dictionaries separately use:
for x, y in df[['name','type']].values:
data = {"name": x, "accountId": y}
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
Or rename column and use DataFrame.to_dict with r for method records:
for data in df[['name','type']].rename(columns={'type':'accountId'}).to_dict('r'):
print (data)
{'name': 'prod_a', 'accountId': 'fruit'}
{'name': 'prod_b', 'accountId': 'vegetable'}
{'name': 'prod_c', 'accountId': 'fruit'}
If need same output use DataFrame.to_dict with l for method list:
data = df[['name','type']].rename(columns={'type':'accountId'}).to_dict('l')
print (data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}

IIUC:
df = pd.DataFrame({
'name': ['prod_a', 'prod_b', 'prod_c'],
'type': ['fruit', 'vegetable', 'fruit']
})
data = dict()
for i in list(df.columns):
data.update({('accountId' if i=='type' else i): list(df[i])})
print(data)
{'name': ['prod_a', 'prod_b', 'prod_c'],
'accountId': ['fruit', 'vegetable', 'fruit']}

Related

Pandas boolean condition from nested list of dictionaries

[{'id': 123,
'type': 'salary', #Parent node
'tx': 'house',
'sector': 'EU',
'transition': [{'id': 'hash', #Child node
'id': 123,
'type': 'salary',
'tx': 'house' }]},
{'userid': 123,
'type': 'salary', #Parent node
'tx': 'office',
'transition': [{'id': 'hash', # Child node
'id': 123,
'type': 'salary',
'tx': 'office'}]}]
As a pandas column ('info') I have some information stored as a nested list of dictionaries like the example above.
What I'm trying to do is a boolean condition whether this list has the following attributes:
More than one 'type' == 'salary' in any of all parents nodes
Field 'tx' is different in any of all parents nodes with 'type' == 'salary'
So far I've tried to flatten a list and filter but it is not solving the first and seconds nodes
a = df.iloc[0].info
values = [item for sublist in [[list(i.values()) for i in a]][0]for item in sublist]
If you want to one line solution, you can use:
df['check'] = df['info'].apply(lambda x: True if sum([1 if i['type']=='salary' else 0 for i in x]) > 1 and [i['tx'] for i in x if i['type']=='salary'].count([i['tx'] for i in x if i['type']=='salary'][0]) != len([i['tx'] for i in x if i['type']=='salary']) else False)
or (expanded):
def check(x):
total_salary = sum([1 if i['type']=='salary' else 0 for i in x]) # get count of "type": "salary" matches
tx_list = [i['tx'] for i in x if i['type']=='salary'] # get tx values when type==salary
tx_check = tx_list.count(tx_list[0]) != len(tx_list) # check all values are same in tx_list
if total_salary > 1 and tx_check:
return True
else:
return False
df['check'] = df['info'].apply(check)

python complex list object to dataframe

I wanted to create a dataframe by expanding the child list object along with parent objects.
Obviously trying pd.DataFrame(lst) does not work as it creates data frame with three columns only and keeps the child object as one column.
Is it possible to do this in one line instead of iterating through list to expand each child object? Thank you in advance.
I have a list object in python like this:
lst = [
{
'id': 'rec1',
'fields': {
'iso': 'US',
'name': 'U S',
'lat': '38.9051',
'lon': '-77.0162'
},
'createdTime': '2021-03-16T13:03:24.000Z'
},
{
'id': 'rec2',
'fields': {'iso': 'HK', 'name': 'China', 'lat': '0.0', 'lon': '0.0'},
'createdTime': '2021-03-16T13:03:24.000Z'
}
]
explected dataframe:
Use json_normalize:
df = pd.json_normalize(lst)
print (df)
id createdTime fields.iso fields.name fields.lat fields.lon
0 rec1 2021-03-16T13:03:24.000Z US U S 38.9051 -77.0162
1 rec2 2021-03-16T13:03:24.000Z HK China 0.0 0.0

Pandas Dataframe from a nested dictionary with list as values

I'm newer to python and pandas and I can't figure out a way to push this dict into a dataframe
a_dict = {'position': [{'points': '57.95', 'name': 'Def'}, {'points': '121', 'name': 'PK'}, {'points': '383.1', 'name': 'RB'}, {'points': '299.96', 'name': 'QB'}, {'points': '177.8', 'name': 'TE'}, {'points': '616.42', 'name': 'WR'}], 'id': 'MIN'}
I have tried multiple FOR loops to iterate through the dict but the list keeps me from organizing it. The data is originally in a JSON format. Thank you!
I'm guessing you want the points and names as columns
points = []
name = []
for dct in a_dict['position']:
points.append(dct['points'])
name.append(dct['name'])
pd.DataFrame({'points':points,'name':name})
With the output
points name
0 57.95 Def
1 121 PK
2 383.1 RB
3 299.96 QB
4 177.8 TE
5 616.42 WR

Collapsing a PANDAs dataframe into a single column of all items and their occurances

I have a data frame consisting of a mixture of NaN's and strings e.g
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
ddf = pd.DataFrame(data)
I want to
1:count the total number of items and put in a new data frame e.g
NaN=2
tree=5
car=2
fish=1
cat=1
dog=1
2:Count the total number of items when compared to a separate longer list (column of a another data frame, e.g
df['compare'] =
NaN
tree
car
fish
cat
dog
rabbit
Pear
Orange
snow
rain
Thanks
Jason
For the first question:
from collections import Counter
data = {
"String1": ["NaN", "tree", "car", "tree"],
"String2": ["cat", "dog", "car", "tree"],
"String3": ["fish", "tree", "NaN", "tree"],
}
ddf = pd.DataFrame(data)
a = Counter(ddf.stack().tolist())
df_result = pd.DataFrame(dict(a), index=['Count']).T
df = pd.DataFrame({'vals':['NaN', 'tree', 'car', 'fish', 'cat', 'dog', 'rabbit', 'Pear', 'Orange', 'snow', 'rain']})
df_counts = df.vals.map(df_result.to_dict()['Count'])
THis should do :)
You can use the following code for count of items over all data frame.
import pandas as pd
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
df = pd.DataFrame(data)
def get_counts(df: pd.DataFrame) -> dict:
res = {}
for col in df.columns:
vc = df[col].value_counts().to_dict()
for k,v in vc.items():
if k in res:
res[k] += v
else:
res[k] = v
return res
counts = get_counts(df)
Output
>>> print(counts)
{'tree': 5, 'car': 2, 'NaN': 2, 'cat': 1, 'dog': 1, 'fish': 1}

Pandas - Filtering out column based on value

I have a Pandas Dataframe that two columns as below (view with header):
name,attribute
abc,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000xyz'}, 'Name': 'Product 1'}
def,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000abc'}, 'Name': 'Product 2'}
klm,{'attributes': {'type': 'RecordType', 'url': '/services/data/v38.0/sobjects/RecordType/000abc'}, 'Name': 'Product 2'}
How could I filter out rows that have attribute like 'Product 1'
Could anyone assist, thanks
Use list comprehension with get for working with rows also if not exist key Name in some row for boolean mask and filter by boolean indexing:
df = df[[x.get('Name') == 'Product 1' for x in df['attribute']]]
Or:
df = df[df['attribute'].apply(lambda x: x.get('Name')) == 'Product 1']
#alternative, working if all Name exist in each row
#df = df[df['attribute'].apply(lambda x: x['Name']) == 'Product 1']
print (df)
name attribute
0 abc {'attributes': {'type': 'RecordType', 'url': '...
EDIT:
If want also filter by nested dictionaries:
df = df[[x.get('attributes').get('type') == 'RecordType' for x in df['attribute']]]