Python pandas dataframe, how to get the set number - pandas

Here is eaxmple:
df=pd.DataFrame([('apple'),('apple'),('apple'),('orange'),('orange')],columns=['A'])
df
Out[5]:
A
0 apple
1 apple
2 apple
3 orange
4 orange
I want to assign a number next to it, example, apple is the first set of list ['apple','orange'], B column is 1, then 2 for orange:
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
Bellow wouldn't work.
df['B']=df['A'].tolist().index(df['A']) +1

You can use the pd.factorize function. This function is used to convert arrays into categorical ones.
pd.Series.factorize is also available as a method of pd.Series objects:
codes, _ = df["A"].factorize()
df["B"] = codes + 1
print(df)
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2

use groupby ngroup + 1 with sort=False to ensure groups are enumerated in the order they appear in the DataFrame:
df['B'] = df.groupby('A', sort=False).ngroup() + 1
df:
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2

Related

How to flat a string to several columns in pandas?

fruit = pd.DataFrame({'type': ['apple: 1 orange: 2 pear: 3']})
I want to flat the dataframe and get the below format:
apple orange pear
1 2 3
Thanks
You are making your live extremely difficult if you work with multiple values in a single field. You can basically use none of the pandas functions because they all assume they data in a field belong together and should stay together.
For instance with
In [10]: fruit = pd.Series({'apple': 1, 'orange': 2, 'pear': 3})
In [11]: fruit
Out[11]:
apple 1
orange 2
pear 3
dtype: int64
you could easily transform your data as in
In [14]: fruit.to_frame()
Out[14]:
0
apple 1
orange 2
pear 3
In [15]: fruit.to_frame().T
Out[15]:
apple orange pear
0 1 2 3

pandas - slice dataframe with condition of end point

I would like to use a search result as end point to slice a dataframe.
import pandas as pd
df = pd.DataFrame({'A':['apple','orange','bananna','watermelon'],'B':[1,2,3,2]})
print(df)
pos = df[df['A'].str.contains('ban')]
print(pos)
: A B
: 0 apple 1
: 1 orange 2
: 2 bananna 3
: 3 watermelon 2
: A B
: 2 bananna 3
for below example, I would like to get output from first row to row start with 'ban', as below:
: A B
: 0 apple 1
: 1 orange 2
: 2 bananna 3
You can use boolean masking and .index attribute:
condition=df[df['A'].str.contains('ban')].index[-1]
Now finally use loc[] accessor or iloc[] accessor:
result=df.loc[:condition,:]
OR
result=df.iloc[:condition+1,:]
Now if you print result you will get:
A B
0 apple 1
1 orange 2
2 bananna 3
Here is the traditional method to do this:
lst = []
for row in df.iterrows():
lst.append(list(row[1])) # Appendig every row as list in temporary list
if str(row[1][0]).startswith('ban'): # Condition
break
new_df = pd.DataFrame(lst)
print(new_df)
Output:
0 1
0 apple 1
1 orange 2
2 bananna 3
:)

I want to split a DataFrame into 2 df's using lambda on a column

I got the following pandas DataFrame:
Value1
Value2
Food
-2
2
Apple
5
-5
Orange
-4
3
Peach
-2
6
Pineapple
I'm now trying to split up the rows into 2 Dataframes based on the 'Value1' Value. So that the outcome would look like this:
NegativeValue1:
Value1
Value2
Food
-2
2
Apple
-4
3
Peach
-2
6
Pineapple
NegativeValue2:
Value1
Value2
Food
5
-5
Orange
I've used a for loop combined with an if statement so far:
for i in range(len(data)):
if data['Value1'].iloc[i] < 0:
NegativeValue1 = NegativeValue1.append(data.iloc[i])
else:
NegativeValue2 = NegativeValue2.append(data.iloc[i])
, which does work well, but is too time-intensive for large df's.
Because of this I want to build a lambda function to do this, but I don't have much experience with lambda.
My attempts so far were unsuccessful:
NegativeValue1 = NegativeValue1.apply(lambda data: if data['Value1'] < 0, axis = 1)
Can anyone help?
You can use boolean indexing:
mask = df["Value1"] < 0
print(df[mask])
print("-" * 80)
print(df[~mask])
Prints:
Value1 Value2 Food
0 -2 2 Apple
2 -4 3 Peach
3 -2 6 Pineapple
--------------------------------------------------------------------------------
Value1 Value2 Food
1 5 -5 Orange
You can use boolean indexing (both simpler and faster than apply):
NegativeValue1 = data[data.Value1 < 0]
# Value1 Value2 Food
# 0 -2 2 Apple
# 2 -4 3 Peach
# 3 -2 6 Pineapple
NegativeValue2 = data[data.Value2 < 0]
# Value1 Value2 Food
# 1 5 -5 Orange

pandas search a value in a dataframe column

I have following dataframe and i want to search apple in column fruits and display all the rows if apple is found.
Before :
number fruits purchase
0 apple yes
mango
banana
1 apple no
cheery
2 mango yes
banana
3 apple yes
orange
4 grapes no
pear
After:
number fruits purchase
0 apple yes
mango
banana
1 apple no
cheery
3 apple yes
orange
Use groupby and filter to filter groups that contain 'apple':
df['number'] = df['number'].ffill()
df.groupby('number').filter(lambda x: (x['fruits'] == 'apple').any())
df_out.assign(number = df_out['number'].mask(df.number.duplicated()))\
.replace(np.nan,'')
Output:
number fruits purchase
0 0 apple yes
1 mango
2 banana
3 1 apple no
4 cheery
7 3 apple yes
8 orange
It looks like you're using 'number' as the index, so I'm going to assume that.
Get the numbers where 'apple' is present, and slice into those:
idx = df.index[df.fruits == 'apple']
df.loc[idx]

How to manipulate specific condition in some colums by pandas

I've question about how to manipulate specific condtion in some colums.
For example,
from pandas import DataFrame
import pandas as pd
df = DataFrame({'name' : ['apple','pineapple','melon','orange','mango','durian'],
'amt' : [200,300,100,1,3,120]},
index = ['1','2','3','4','5','6'])
print(df)
I can see,
amt name
1 200 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
From above result I want to manipulate amt of apple with other items hold.
I just only know...
df.loc[df.name.str.contains('apple'), 'amt'] = df['amt']/100
This syntax manipulates not only 'apple' but 'pineapple'.
I'd like to get only result revising apple's amt like...
amt name
1 2 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
Is there anyone help me?
Thanks for reading.