How do one add a different substring to each row based on a condition in pandas?
Here is a dummy dataframe that I created:
import numpy as np
import pandas as pd
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 2)))
df.columns = ['A','B']
If I replace the rows in B, with a string YYYY for those rows which have the value in A less then 5, then I would do it this way:
df.loc[df['A'] < 2, 'B'] = 'YYYY'
This is the current output of original df:
A B
0 3 4
1 0 1
2 3 0
3 0 1
4 4 4
Of replaced df:
A B
0 3 4
1 0 YYYY
2 3 0
3 0 YYYY
4 4 4
What I instead want is:
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4
Here is necessary generate list with same size like number of Trues values with range and sum, then convert to strings and join together:
m = df['A'] < 2
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + list(map(str, range(1, m.sum() + 1)))
print (df)
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4
Or you can use f-strings for generate new list:
m = df['A'] < 2
df.loc[m, 'B'] = [f'{b}_{a}' for a, b in zip(range(1, m.sum() + 1), df.loc[m, 'B'])]
EDIT1:
m = df['A'] < 4
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + df[m].groupby('B').cumcount().add(1).astype(str)
print (df)
A B
0 3 4_1
1 0 1_1
2 3 0_1
3 0 1_2
4 4 4
Related
l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(data = l)
col1
0 [1, 2, 3]
1 [4, 5, 6]
Desired output:
col1
0 1
1 2
2 3
3 4
4 5
5 6
Here is explode
df.explode('col1')
col1
0 1
0 2
0 3
1 4
1 5
1 6
You can use np.ravel to flatten the list of lists:
import numpy as np, pandas as pd
l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(np.ravel(*l.values()),columns=l.keys())
>>> df
col1
0 1
1 2
2 3
3 4
4 5
5 6
I have the below sample df, and I'd like to select all the rows that are between a range of values in a specific column:
0 1 2 3 4 5 index
0 -252.44 -393.07 886.72 -2.04 1.58 -2.41 0
1 -260.25 -415.53 881.35 -3.07 0.08 -1.66 1
2 -267.58 -412.60 893.07 -2.98 -1.15 -2.66 2
3 -279.30 -417.97 880.86 -1.15 -0.50 -1.37 3
4 -252.93 -395.51 883.30 -1.30 1.43 4.17 4
I'd like to get the below df (all the rows between index value of 1-3):
0 1 2 3 4 5 index
1 -260.25 -415.53 881.35 -3.07 0.08 -1.66 1
2 -267.58 -412.60 893.07 -2.98 -1.15 -2.66 2
3 -279.30 -417.97 880.86 -1.15 -0.50 -1.37 3
How can I do it?
I tried the below which didn't work:
new_df = df[df['index'] >= 1 & df['index'] <= 3]
Between min and max: use between():
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1,2,3], 'b':[11,12,13]})
>>> df
a b
0 1 11
1 2 12
2 3 13
>>> df[df.a.between(1,2)]
a b
0 1 11
1 2 12
Your attempt new_df = df[df['index'] >= 1 & df['index'] <= 3] is wrong in two places:
it's df.index, not df["index"]
when using multiple filters, use parentheses: df[(df.index >= 1) & (df.index <= 3)]
import pandas as pd
df = pd.DataFrame(columns=['A','B'])
df['A']=['A','B','A','A','B','B','B']
df['B']=[2,4,3,5,6,7,8]
df
A B
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
df.columns=['id','num']
df
id num
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
I would like to apply groupby on id column but some condition on num column
I want to have 2 columns is_even_count and is_odd_count columns in final data frame where is_even_count only counts even numbers from num column after grouping and is_odd_count only counts odd numbers from num column after grouping.
My output will be
is_even_count is_odd_count
A 1 2
B 3 1
how can i do this in pandas
Use modulo division by 2 and compare by 1 with map:
d = {True:'is_odd_count', False:'is_even_count'}
df = df.groupby(['id', (df['num'] % 2 == 1).map(d)]).size().unstack(fill_value=0)
print (df)
num is_even_count is_odd_count
id
A 1 2
B 3 1
Another solution with crosstab:
df = pd.crosstab(df['id'], (df['num'] % 2 == 1).map(d))
Alternative with numpy.where:
a = np.where(df['num'] % 2 == 1, 'is_odd_count', 'is_even_count')
df = pd.crosstab(df['id'], a)
I splited a dataframe into two parts and changed their column names seperately. Here's what I got:
df1 = df[df['colname'==0]]
df2 = df[df['colname'==1]]
df1.columns = [ 'a'+ x for x in df1.columns]
df2.columns = [ 'b'+ x for x in df2.columns]
And it turned out df2 has the columns start with 'ba' rather than 'b'. What happened?
I cannot simulate your problem, for me working nice.
Alternative solution should be add_prefix instead list comprehension:
df = pd.DataFrame({'colname':[0,1,0,0,0,1],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
C D E F colname
0 7 1 5 a 0
1 8 3 3 a 1
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
5 3 0 4 b 1
df1 = df[df['colname']==0].add_prefix('a')
df2 = df[df['colname']==1].add_prefix('b')
print (df1)
aC aD aE aF acolname
0 7 1 5 a 0
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
print (df2)
bC bD bE bF bcolname
1 8 3 3 a 1
5 3 0 4 b 1
I have a dataframe and a list. I would like to iterate over elements in the list and find their location in dataframe then store this to a new dataframe
my_list = ['1','2','3','4','5']
df1 = pd.DataFrame(my_list, columns=['Num'])
dataframe : df1
Num
0 1
1 2
2 3
3 4
4 5
dataframe : df2
0 1 2 3 4
0 9 12 8 6 7
1 11 1 4 10 13
2 5 14 2 0 3
I've tried something similar to this but doesn't work
for x in my_list:
i,j= np.array(np.where(df==x)).tolist()
df2['X'] = df.append(i)
df2['Y'] = df.append(j)
so looking for a result like this
dataframe : df1 updated
Num X Y
0 1 1 1
1 2 2 2
2 3 2 4
3 4 1 2
4 5 2 0
any hints or ideas would be appreciated
Instead of trying to find the value in df2, why not just make df2 a flat dataframe.
df2 = pd.melt(df2)
df2.reset_index(inplace=True)
df2.columns = ['X', 'Y', 'Num']
so now your df2 just looks like this:
Index X Y Num
0 0 0 9
1 1 0 11
2 2 0 5
3 3 1 12
4 4 1 1
5 5 1 14
You can of course sort by Num and if you just want the values from your list you can further filter df2:
df2 = df2[df2.Num.isin(my_list)]