Why doesn't pandas dataframe need full row values?

Why doesn't pandas dataframe need full row values? - pandas

fields = ['name', 'type', 'age']
df = pd.DataFrame(columns=fields)
item1 = {'name': 'john', type:'student', 'age': 21}
item2 = {'name': 'john', 'age': 21}
for item in items:
df = df.append(item, ignore_index=True)
I had thought only 'item1' would be able to be appended, not 'item2' since it has only 2 required fields. Is this normal?

Related

How to parse a nested column in a df column?

Is there a smart pythonic way to parse a nested column in a pandas dataframe like this one to 3 different columns? So for example the column could look like this:
col1
[{'name': 'amount', 'value': 1}, {'name': 'frequency', 'value': 2}, {'name': 'freq_unit', 'value': 'month'}]
[{'name': 'amount', 'value': 3}, {'name': 'frequency', 'value': 1}, {'name': 'freq_unit', 'value': 'month'}]
And the expected result should be these 3 columns:
amount frequency freq_unit
1 2 month
3 1 month
That's just level 1. I have the level 2: What if the elements in the list still have the same names (amount, frequency and freq_unit) but the order could change? Could the code in the answer deal with this?
col1
[{'name': 'amount', 'value': 1}, {'name': 'frequency', 'value': 2}, {'name': 'freq_unit', 'value': 'month'}]
[{'name': 'amount', 'value': 3}, {'name': 'freq_unit', 'value': 'month'}, {'name': 'frequency', 'value': 1}]
Code for reproduce the data. Really look forward to see how the community would solve this. Thank you
data = {'col1':[[{'name': 'amount', 'value': 1}, {'name': 'frequency', 'value': 2}, {'name': 'freq_unit', 'value': 'month'}],
[{'name': 'amount', 'value': 3}, {'name': 'frequency', 'value': 1}, {'name': 'freq_unit', 'value': 'month'}]]}
df = pd.DataFrame(data)

A combination of list comprehension, itertools.chain, and collections.defaultdict could help out here:
from itertools import chain
from collections import defaultdict
data = defaultdict(list)
phase1 = [[(data["name"], data["value"])
for data in entry]
for entry in df.col1
]
phase1 = chain.from_iterable(phase1)
for key, value in phase1:
data[key].append(value)
pd.DataFrame(data)
amount frequency freq_unit
0 1 2 month
1 3 1 month
The above is verbose: #piRSquared's comment is much simpler, with a list comprehension:
pd.DataFrame([{x["name"]: x["value"] for x in lst} for lst in df.col1])
Another idea, but very unnecessary, is to use a list comprehension, combined with Pandas' string methods:
outcome = [(df.col1.str[num].str["value"]
.rename(df.col1.str[num].str["name"][0])
)
for num in range(df.col1.str.len()[0])
]
pd.concat(outcome, axis = 'columns')
#piRsquared's solution is the simplest, in my opinion.

You can write a function that will parse each cell in your Series and return a properly formatted Series and use apply to tuck the iteration away:
>>> def custom_parser(record):
... clean_record = {rec["name"]: rec["value"] for rec in record}
... return pd.Series(clean_record)
>>> df["col1"].apply(custom_parser)
amount frequency freq_unit
0 1 2 month
1 3 1 month

pandas same attribute comparison

I have the following dataframe:
df = pd.DataFrame([{'name': 'a', 'label': 'false', 'score': 10},
{'name': 'a', 'label': 'true', 'score': 8},
{'name': 'c', 'label': 'false', 'score': 10},
{'name': 'c', 'label': 'true', 'score': 4},
{'name': 'd', 'label': 'false', 'score': 10},
{'name': 'd', 'label': 'true', 'score': 6},
])
I want to return names that have the "false" label score value higher than the score value of the "true" label with at least the double. In my example, it should return only the "c" name.

First you can pivot the data, and look at the ratio, filter what you want:
new_df = df.pivot(index='name',columns='label', values='score')
new_df[new_df['false'].div(new_df['true']).gt(2)]
output:
label false true
name
c 10 4
If you only want the label, you can do:
new_df.index[new_df['false'].div(new_df['true']).gt(2)].values
which gives
array(['c'], dtype=object)
Update: Since your data is result of orig_df.groupby().count(), you could instead do:
orig_df['label'].eq('true').groupby('name').mean()
and look at the rows with values <= 1/3.

why the following Bigquery insertion is failing?

Hello I am trying to insert one row into a table, I succesfully created the table as follows:
schema = [{'name': 'foo', 'type': 'STRING', 'mode': 'nullable'},{'name': 'bar', 'type': 'FLOAT', 'mode': 'nullable'}]
created = client.create_table(dataset='api_data_set_course_33', table='insert_test_333', schema=schema)
print('Creation Result ',created)
However when I push the row I got False,
rows = [{'id': 'NzAzYmRiY', 'one': 'uno', 'two': 'dos'}]
inserted = client.push_rows('api_data_set_course_33','insert_test_333', rows, 'id')
print('Insertion Result ',inserted)
So I don't have idea what is wrong, I really would like to appreciate support to overcome this task
This is the API that I am testing:
https://github.com/tylertreat/BigQuery-Python
This is my complete code:
schema = [{'name': 'foo', 'type': 'STRING', 'mode': 'nullable'},{'name': 'bar', 'type': 'FLOAT', 'mode': 'nullable'}]
created = client.create_table(dataset='api_data_set_course_33', table='insert_test_333', schema=schema)
print('Creation Result ',created)
rows = [{'id': 'NzAzYmRiY', 'one': 'uno', 'two': 'dos'}]
inserted = client.push_rows('api_data_set_course_33','insert_test_333', rows, 'id')
print('Insertion Result ',inserted)
Output:
Creation Result True
Insertion Result False
After feedback I tried:
>>> client = get_client(project_id, service_account=service_account,private_key_file=key, readonly=False)
>>> schema = [{'name': 'foo', 'type': 'STRING', 'mode': 'nullable'},{'name': 'bar', 'type': 'FLOAT', 'mode': 'nullable'}]
>>> rows = [{'id': 'NzAzYmRiY', 'foo': 'uno', 'bar': 'dos'}]
>>> inserted = client.push_rows('api_data_set_course_33','insert_test_333', rows, 'id')
>>> print(inserted)
False
and also:
>>> rows = [{'id': 'NzAzYmRiY', 'foo': 'uno', 'bar': 45}]
>>> inserted = client.push_rows('api_data_set_course_33','insert_test_333', rows, 'id')
>>> print(inserted)
False
However I only got false

Your row field names don't match your schema field names. Try this instead:
rows = [{'id': 'NzAzYmRiY', 'foo': 'uno', 'bar': 'dos'}]

Convert list of dictionary in a dataframe to seperate dataframe

To convert list of dictionary already present in the dataset to a dataframe.
The dataset looks something like this.
[{'id': 35, 'name': 'Comedy'}]
How do I convert this list of dictionary to dataframe?
Thank you for your time!
I want to retrieve:
Comedy
from the list of dictionary.

Use:
df = pd.DataFrame({'col':[[{'id': 35, 'name': 'Comedy'}],[{'id': 35, 'name': 'Western'}]]})
print (df)
col
0 [{'id': 35, 'name': 'Comedy'}]
1 [{'id': 35, 'name': 'Western'}]
df['new'] = df['col'].apply(lambda x: x[0].get('name'))
print (df)
col new
0 [{'id': 35, 'name': 'Comedy'}] Comedy
1 [{'id': 35, 'name': 'Western'}] Western
If possible multiple dicts in list:
df = pd.DataFrame({'col':[[{'id': 35, 'name': 'Comedy'}, {'id':4, 'name':'Horror'}],
[{'id': 35, 'name': 'Western'}]]})
print (df)
col
0 [{'id': 35, 'name': 'Comedy'}, {'id': 4, 'name...
1 [{'id': 35, 'name': 'Western'}]
df['new'] = df['col'].apply(lambda x: [y.get('name') for y in x])
print (df)
col new
0 [{'id': 35, 'name': 'Comedy'}, {'id': 4, 'name... [Comedy, Horror]
1 [{'id': 35, 'name': 'Western'}] [Western]
And if want extract all values:
df1 = pd.concat([pd.DataFrame(x) for x in df['col']], ignore_index=True)
print (df1)
id name
0 35 Comedy
1 4 Horror
2 35 Western

Convert pandas to dictionary defining the columns used fo the key values

There's the pandas dataframe 'test_df'. My aim is to convert it to a dictionary. Therefore I run this:
id Name Gender Age
0 1 'Peter' 'M' 32
1 2 'Lara' 'F' 45
Therefore I run this:
test_dict = test_df.set_index('id').T.to_dict()
The output is this:
{1: {'Name': 'Peter', 'Gender': 'M', 'Age': 32}, 2: {'Name': 'Lara', 'Gender': 'F', 'Age': 45}}
Now, I want to choose only the 'Name' and 'Gender' columns as the values of dictionary's keys. I'm trying to modify the above script into sth like this:
test_dict = test_df.set_index('id')['Name']['Gender'].T.to_dict()
with no success!
Any suggestion please?!

You was very close, use subset of columns [['Name','Gender']]:
test_dict = test_df.set_index('id')[['Name','Gender']].T.to_dict()
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}
Also T is not necessary, use parameter orient='index':
test_dict = test_df.set_index('id')[['Name','Gender']].to_dict(orient='index')
print (test_dict)
{1: {'Name': 'Peter', 'Gender': 'M'}, 2: {'Name': 'Lara', 'Gender': 'F'}}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why doesn't pandas dataframe need full row values? - pandas

Related

How to parse a nested column in a df column?

pandas same attribute comparison

why the following Bigquery insertion is failing?

Convert list of dictionary in a dataframe to seperate dataframe

Convert pandas to dictionary defining the columns used fo the key values

Categories

Resources