Here is eaxmple:
df=pd.DataFrame([('apple'),('apple'),('apple'),('orange'),('orange')],columns=['A'])
df
Out[5]:
A
0 apple
1 apple
2 apple
3 orange
4 orange
I want to assign a number next to it, example, apple is the first set of list ['apple','orange'], B column is 1, then 2 for orange:
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
Bellow wouldn't work.
df['B']=df['A'].tolist().index(df['A']) +1
You can use the pd.factorize function. This function is used to convert arrays into categorical ones.
pd.Series.factorize is also available as a method of pd.Series objects:
codes, _ = df["A"].factorize()
df["B"] = codes + 1
print(df)
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
use groupby ngroup + 1 with sort=False to ensure groups are enumerated in the order they appear in the DataFrame:
df['B'] = df.groupby('A', sort=False).ngroup() + 1
df:
A B
0 apple 1
1 apple 1
2 apple 1
3 orange 2
4 orange 2
Related
fruit = pd.DataFrame({'type': ['apple: 1 orange: 2 pear: 3']})
I want to flat the dataframe and get the below format:
apple orange pear
1 2 3
Thanks
You are making your live extremely difficult if you work with multiple values in a single field. You can basically use none of the pandas functions because they all assume they data in a field belong together and should stay together.
For instance with
In [10]: fruit = pd.Series({'apple': 1, 'orange': 2, 'pear': 3})
In [11]: fruit
Out[11]:
apple 1
orange 2
pear 3
dtype: int64
you could easily transform your data as in
In [14]: fruit.to_frame()
Out[14]:
0
apple 1
orange 2
pear 3
In [15]: fruit.to_frame().T
Out[15]:
apple orange pear
0 1 2 3
I would like to use a search result as end point to slice a dataframe.
import pandas as pd
df = pd.DataFrame({'A':['apple','orange','bananna','watermelon'],'B':[1,2,3,2]})
print(df)
pos = df[df['A'].str.contains('ban')]
print(pos)
: A B
: 0 apple 1
: 1 orange 2
: 2 bananna 3
: 3 watermelon 2
: A B
: 2 bananna 3
for below example, I would like to get output from first row to row start with 'ban', as below:
: A B
: 0 apple 1
: 1 orange 2
: 2 bananna 3
You can use boolean masking and .index attribute:
condition=df[df['A'].str.contains('ban')].index[-1]
Now finally use loc[] accessor or iloc[] accessor:
result=df.loc[:condition,:]
OR
result=df.iloc[:condition+1,:]
Now if you print result you will get:
A B
0 apple 1
1 orange 2
2 bananna 3
Here is the traditional method to do this:
lst = []
for row in df.iterrows():
lst.append(list(row[1])) # Appendig every row as list in temporary list
if str(row[1][0]).startswith('ban'): # Condition
break
new_df = pd.DataFrame(lst)
print(new_df)
Output:
0 1
0 apple 1
1 orange 2
2 bananna 3
:)
I got the following pandas DataFrame:
Value1
Value2
Food
-2
2
Apple
5
-5
Orange
-4
3
Peach
-2
6
Pineapple
I'm now trying to split up the rows into 2 Dataframes based on the 'Value1' Value. So that the outcome would look like this:
NegativeValue1:
Value1
Value2
Food
-2
2
Apple
-4
3
Peach
-2
6
Pineapple
NegativeValue2:
Value1
Value2
Food
5
-5
Orange
I've used a for loop combined with an if statement so far:
for i in range(len(data)):
if data['Value1'].iloc[i] < 0:
NegativeValue1 = NegativeValue1.append(data.iloc[i])
else:
NegativeValue2 = NegativeValue2.append(data.iloc[i])
, which does work well, but is too time-intensive for large df's.
Because of this I want to build a lambda function to do this, but I don't have much experience with lambda.
My attempts so far were unsuccessful:
NegativeValue1 = NegativeValue1.apply(lambda data: if data['Value1'] < 0, axis = 1)
Can anyone help?
You can use boolean indexing:
mask = df["Value1"] < 0
print(df[mask])
print("-" * 80)
print(df[~mask])
Prints:
Value1 Value2 Food
0 -2 2 Apple
2 -4 3 Peach
3 -2 6 Pineapple
--------------------------------------------------------------------------------
Value1 Value2 Food
1 5 -5 Orange
You can use boolean indexing (both simpler and faster than apply):
NegativeValue1 = data[data.Value1 < 0]
# Value1 Value2 Food
# 0 -2 2 Apple
# 2 -4 3 Peach
# 3 -2 6 Pineapple
NegativeValue2 = data[data.Value2 < 0]
# Value1 Value2 Food
# 1 5 -5 Orange
I have following dataframe and i want to search apple in column fruits and display all the rows if apple is found.
Before :
number fruits purchase
0 apple yes
mango
banana
1 apple no
cheery
2 mango yes
banana
3 apple yes
orange
4 grapes no
pear
After:
number fruits purchase
0 apple yes
mango
banana
1 apple no
cheery
3 apple yes
orange
Use groupby and filter to filter groups that contain 'apple':
df['number'] = df['number'].ffill()
df.groupby('number').filter(lambda x: (x['fruits'] == 'apple').any())
df_out.assign(number = df_out['number'].mask(df.number.duplicated()))\
.replace(np.nan,'')
Output:
number fruits purchase
0 0 apple yes
1 mango
2 banana
3 1 apple no
4 cheery
7 3 apple yes
8 orange
It looks like you're using 'number' as the index, so I'm going to assume that.
Get the numbers where 'apple' is present, and slice into those:
idx = df.index[df.fruits == 'apple']
df.loc[idx]
I've question about how to manipulate specific condtion in some colums.
For example,
from pandas import DataFrame
import pandas as pd
df = DataFrame({'name' : ['apple','pineapple','melon','orange','mango','durian'],
'amt' : [200,300,100,1,3,120]},
index = ['1','2','3','4','5','6'])
print(df)
I can see,
amt name
1 200 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
From above result I want to manipulate amt of apple with other items hold.
I just only know...
df.loc[df.name.str.contains('apple'), 'amt'] = df['amt']/100
This syntax manipulates not only 'apple' but 'pineapple'.
I'd like to get only result revising apple's amt like...
amt name
1 2 apple
2 300 pineapple
3 100 melon
4 1 orange
5 3 mango
6 120 durian
Is there anyone help me?
Thanks for reading.