How to manipulate specific condition in some colums by pandas - pandas

I've question about how to manipulate specific condtion in some colums.
For example,
from pandas import DataFrame
import pandas as pd
df = DataFrame({'name' : ['apple','pineapple','melon','orange','mango','durian'],
'amt' : [200,300,100,1,3,120]},
index = ['1','2','3','4','5','6'])
I can see,
amt name
1 200 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
From above result I want to manipulate amt of apple with other items hold.
I just only know...
df.loc['apple'), 'amt'] = df['amt']/100
This syntax manipulates not only 'apple' but 'pineapple'.
I'd like to get only result revising apple's amt like...
amt name
1 2 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
Is there anyone help me?
Thanks for reading.


Python pandas dataframe, how to get the set number

Here is eaxmple:
0 apple
1 apple
2 apple
3 orange
4 orange
I want to assign a number next to it, example, apple is the first set of list ['apple','orange'], B column is 1, then 2 for orange:
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
Bellow wouldn't work.
df['B']=df['A'].tolist().index(df['A']) +1
You can use the pd.factorize function. This function is used to convert arrays into categorical ones.
pd.Series.factorize is also available as a method of pd.Series objects:
codes, _ = df["A"].factorize()
df["B"] = codes + 1
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
use groupby ngroup + 1 with sort=False to ensure groups are enumerated in the order they appear in the DataFrame:
df['B'] = df.groupby('A', sort=False).ngroup() + 1
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2

How to flat a string to several columns in pandas?

fruit = pd.DataFrame({'type': ['apple: 1 orange: 2 pear: 3']})
I want to flat the dataframe and get the below format:
apple orange pear
1 2 3
You are making your live extremely difficult if you work with multiple values in a single field. You can basically use none of the pandas functions because they all assume they data in a field belong together and should stay together.
For instance with
In [10]: fruit = pd.Series({'apple': 1, 'orange': 2, 'pear': 3})
In [11]: fruit
apple 1
orange 2
pear 3
dtype: int64
you could easily transform your data as in
In [14]: fruit.to_frame()
apple 1
orange 2
pear 3
In [15]: fruit.to_frame().T
apple orange pear
0 1 2 3

How do you groupby and aggregate using conditional statements in Pandas?

Expanding on the question here, I'm wondering how to add aggregation to the following based on conditions:
Index Name Item Quantity
0 John Apple Red 10
1 John Apple Green 5
2 John Orange Cali 12
3 Jane Apple Red 10
4 Jane Apple Green 5
5 Jane Orange Cali 18
6 Jane Orange Spain 2
7 John Banana 3
8 Jane Coconut 5
9 John Lime 10
... And so forth
What I need to do is getting this data converted into a dataframe like the following. Note: I am only interested in getting the total quantity of the apples and oranges both of them in separate columns, i.e. whatever other items appear in a certain group are not to be included in the aggregation done on column "Quantity" (but they are still to appear in the column "All items" as strings):
Index Name All Items Apples Total Oranges Total
0 John Apple Red, Apple Green, Orange Cali, Banana, Lime 15 12
1 Jane Apple Red, Apple Green, Orange Cali, Coconut 15 20
How would do I achieve that? Many thanks in advance!
You can use groupby and pivot_table after extracting Apple and Orange sub strings as below:
import re
s = df['Item'].str.extract("(Apple|Orange)",expand=False,flags=re.I)
# re.I used above is optional and is used for case insensitive matching
a = df.assign(Item_1=s).dropna(subset=['Item_1'])
out = (a.groupby("Name")['Item'].agg(",".join).to_frame().join(
Name Item Apple_Total \
0 Jane Apple Red,Apple Green,Orange Cali,Orange Spain 15
1 John Apple Red,Apple Green,Orange Cali 15
0 20
1 12
For edited question, you can use the same code only except groupby on the original dataframe df instead of the subset a and then join:
out = (df.groupby("Name")['Item'].agg(",".join).to_frame().join(
Name Item Apple_Total \
0 Jane Apple Red,Apple Green,Orange Cali,Orange Spain... 15
1 John Apple Red,Apple Green,Orange Cali,Banana,Lime 15
0 20
1 12
First Filter only the required rows using str.contains on the column Item
from io import StringIO
import pandas as pd
s = StringIO("""Name;Item;Quantity
John;Apple Red;10
John;Apple Green;5
John;Orange Cali;12
Jane;Apple Red;10
Jane;Apple Green;5
Jane;Orange Cali;18
Jane;Orange Spain;2
df = pd.read_csv(s,sep=';')
req_items_idx = df[df.Item.str.contains('Apple|Orange')].index
df_filtered = df.loc[req_items_idx,:]
Once you have them you can further pivot the data to get the required values based on Name
pivot_df = pd.pivot_table(df_filtered,index=['Name'],columns=['Item'],aggfunc='sum')
pivot_df.columns = pivot_df.columns.droplevel() = None
pivot_df = pivot_df.reset_index()
Generate the Totals for Apples and Oranges
orange_columns = pivot_df.columns[pivot_df.columns.str.contains('Orange')].tolist()
apple_columns = pivot_df.columns[pivot_df.columns.str.contains('Apple')].tolist()
pivot_df['Apples Total'] = pivot_df.loc[:,apple_columns].sum(axis=1)
pivot_df['Orange Total'] = pivot_df.loc[:,orange_columns].sum(axis=1)
A wrapper function to combine the Items together
def combine_items(inp,columns):
res = []
for val,col in zip(inp.values,columns):
if not pd.isnull(val):
res += [col]
return ','.join(res)
req_columns = apple_columns+orange_columns
pivot_df['Items'] = pivot_df[apple_columns+orange_columns].apply(combine_items,args=([req_columns]),axis=1)
Finally you can get the required columns in a single place and print the values
total_columns = pivot_df.columns[pivot_df.columns.str.contains('Total')].tolist()
name_item_columns = pivot_df.columns[pivot_df.columns.str.contains('Name|Items')].tolist()
>>> pivot_df[name_item_columns+total_columns]
Name Items Apples Total Orange Total
0 Jane Apple Green,Apple Red,Orange Cali,Orange Spain 15.0 20.0
1 John Apple Green,Apple Red,Orange Cali 15.0 12.0
The answer is intended to outline the individual steps and approach one can take to solve something similar to this
Edits: fixed a bug.
To do this, before doing your groupby you can create your Total columns. These will contain a the number of apples and oranges in that row, depending whether that row's Item is apple or orange.
df['Apples Total'] = df.apply(lambda x: x.Quantity if ('Apple' in x.Item) else 0, axis=1)
df['Oranges Total'] = df.apply(lambda x: x.Quantity if ('Orange' in x.Item) else 0, axis=1)
When this is in place, groupby name and aggregate on each column. Sum on the total columns, and aggregate to list on the item column.
df.groupby('Name').agg({'Apples Total': 'sum',
'Oranges Total': 'sum',
'Item': lambda x: list(x)
df = pd.read_csv(StringIO("""
0,John,Apple Red,10
1,John,Apple Green,5
2,John,Orange Cali,12
3,Jane,Apple Red,10
4,Jane,Apple Green,5
5,Jane,Orange Cali,18
6,Jane,Orange Spain,2
Getting list of items
grouping by name to get the list of items
items_list = pd.DataFrame(df.groupby(["Name"])["Item"].apply(list)).rename(columns={"Item": "All Items"})
All Items
Jane [Apple Red, Apple Green, Orange Cali, Orange Spain, Coconut]
John [Apple Red, Apple Green, Orange Cali, Banana, Lime]
getting count of name item groups
renaming the temp df items column such that all the apples/oranges are treated similarly
temp2 = df.groupby(["Name", "Item"])['Quantity'].apply(sum)
temp2 = pd.DataFrame(temp2).reset_index().set_index("Name")
temp2['Item'] = temp2['Item'].str.replace(r'(?:.*)(apple|orange)(?:.*)', r'\1', case=False,regex=True)
Item Quantity
Jane Apple 5
Jane Apple 10
Jane Coconut 5
Jane Orange 18
Jane Orange 2
John Apple 5
John Apple 10
John Banana 3
John Lime 10
John Orange 12
getting the required pivot table
pivot table for getting items count as separate column and retaining just apple orange count
pivot_df = pd.pivot_table(temp2, values='Quantity', columns='Item', index=["Name"], aggfunc=np.sum)
pivot_df = pivot_df[['Apple', 'Orange']]
Item Apple Orange
Jane 15.0 20.0
John 15.0 12.0
merging the items list df and the pivot_df
output = items_list.merge(pivot_df, on="Name").rename(columns = {'Apple': 'Apples
Total', 'Orange': 'Oranges Total'})
All Items Apples Total Oranges Total
Jane [Apple Red, Apple Green, Orange Cali, Orange Spain, Coconut] 15.0 20.0
John [Apple Red, Apple Green, Orange Cali, Banana, Lime] 15.0 12.0

pandas search a value in a dataframe column

I have following dataframe and i want to search apple in column fruits and display all the rows if apple is found.
Before :
number fruits purchase
0 apple yes
1 apple no
2 mango yes
3 apple yes
4 grapes no
number fruits purchase
0 apple yes
1 apple no
3 apple yes
Use groupby and filter to filter groups that contain 'apple':
df['number'] = df['number'].ffill()
df.groupby('number').filter(lambda x: (x['fruits'] == 'apple').any())
df_out.assign(number = df_out['number'].mask(df.number.duplicated()))\
number fruits purchase
0 0 apple yes
1 mango
2 banana
3 1 apple no
4 cheery
7 3 apple yes
8 orange
It looks like you're using 'number' as the index, so I'm going to assume that.
Get the numbers where 'apple' is present, and slice into those:
idx = df.index[df.fruits == 'apple']

Python3 pandas: data frame grouped by a columns(such as name), then extract a number of rows for each group

There is data frame called df as following:
name id age text
a 1 1 very good, and I like him
b 2 2 I play basketball with his brother
c 3 3 I hope to get a offer
d 4 4 everything goes well, I think
a 1 1 I will visit china
b 2 2 no one can understand me, I will solve it
c 3 3 I like followers
d 4 4 maybe I will be good
a 1 1 I should work hard to finish my research
b 2 2 water is the source of earth, I agree it
c 3 3 I hope you can keep in touch with me
d 4 4 My baby is very cute, I like him
The data frame is grouped by name, then I want to extract a number of rows by row index(for example: 2) for the new dataframe: df_new.
name id age text
a 1 1 very good, and I like him
a 1 1 I will visit china
b 2 2 I play basketball with his brother
b 2 2 no one can understand me, I will solve it
c 3 3 I hope to get a offer
c 3 3 I like followers
d 4 4 everything goes well, I think
d 4 4 maybe I will be good
df_new = (df.groupby('screen_name'))[0:2]
But there is error:
TypeError: unhashable type: 'slice'
Try using head() instead.
import pandas as pd
from io import StringIO
buff = StringIO('''
a,1,1,"very good, and I like him"
b,2,2,I play basketball with his brother
c,3,3,I hope to get a offer
d,4,4,"everything goes well, I think"
a,1,1,I will visit china
b,2,2,"no one can understand me, I will solve it"
c,3,3,I like followers
d,4,4,maybe I will be good
a,1,1,I should work hard to finish my research
b,2,2,"water is the source of earth, I agree it"
c,3,3,I hope you can keep in touch with me
d,4,4,"My baby is very cute, I like him"
df = pd.read_csv(buff)
using head() instead of [:2] then sorting by name
df_new = df.groupby('name').head(2).sort_values('name')
name id age text
0 a 1 1 very good, and I like him
4 a 1 1 I will visit china
1 b 2 2 I play basketball with his brother
5 b 2 2 no one can understand me, I will solve it
2 c 3 3 I hope to get a offer
6 c 3 3 I like followers
3 d 4 4 everything goes well, I think
7 d 4 4 maybe I will be good
Another solution with iloc:
df_new = df.groupby('name').apply(lambda x: x.iloc[:2]).reset_index(drop=True)
name id age text
0 a 1 1 very good, and I like him
1 a 1 1 I will visit china
2 b 2 2 I play basketball with his brother
3 b 2 2 no one can understand me, I will solve it
4 c 3 3 I hope to get a offer
5 c 3 3 I like followers
6 d 4 4 everything goes well, I think
7 d 4 4 maybe I will be good