different substring for each row based on condition

different substring for each row based on condition - pandas

How do one add a different substring to each row based on a condition in pandas?
Here is a dummy dataframe that I created:
import numpy as np
import pandas as pd
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 2)))
df.columns = ['A','B']
If I replace the rows in B, with a string YYYY for those rows which have the value in A less then 5, then I would do it this way:
df.loc[df['A'] < 2, 'B'] = 'YYYY'
This is the current output of original df:
A B
0 3 4
1 0 1
2 3 0
3 0 1
4 4 4
Of replaced df:
A B
0 3 4
1 0 YYYY
2 3 0
3 0 YYYY
4 4 4
What I instead want is:
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4

Here is necessary generate list with same size like number of Trues values with range and sum, then convert to strings and join together:
m = df['A'] < 2
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + list(map(str, range(1, m.sum() + 1)))
print (df)
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4
Or you can use f-strings for generate new list:
m = df['A'] < 2
df.loc[m, 'B'] = [f'{b}_{a}' for a, b in zip(range(1, m.sum() + 1), df.loc[m, 'B'])]
EDIT1:
m = df['A'] < 4
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + df[m].groupby('B').cumcount().add(1).astype(str)
print (df)
A B
0 3 4_1
1 0 1_1
2 3 0_1
3 0 1_2
4 4 4

Related

adding multiple lists into one column DataFrame pandas

l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(data = l)
col1
0 [1, 2, 3]
1 [4, 5, 6]
Desired output:
col1
0 1
1 2
2 3
3 4
4 5
5 6

Here is explode
df.explode('col1')
col1
0 1
0 2
0 3
1 4
1 5
1 6

You can use np.ravel to flatten the list of lists:
import numpy as np, pandas as pd
l = {'col1': [[1,2,3], [4,5,6]]}
df = pd.DataFrame(np.ravel(*l.values()),columns=l.keys())
>>> df
col1
0 1
1 2
2 3
3 4
4 5
5 6

How to select all the rows that are between a range of values in specific column in pandas dataframe

I have the below sample df, and I'd like to select all the rows that are between a range of values in a specific column:
0 1 2 3 4 5 index
0 -252.44 -393.07 886.72 -2.04 1.58 -2.41 0
1 -260.25 -415.53 881.35 -3.07 0.08 -1.66 1
2 -267.58 -412.60 893.07 -2.98 -1.15 -2.66 2
3 -279.30 -417.97 880.86 -1.15 -0.50 -1.37 3
4 -252.93 -395.51 883.30 -1.30 1.43 4.17 4
I'd like to get the below df (all the rows between index value of 1-3):
0 1 2 3 4 5 index
1 -260.25 -415.53 881.35 -3.07 0.08 -1.66 1
2 -267.58 -412.60 893.07 -2.98 -1.15 -2.66 2
3 -279.30 -417.97 880.86 -1.15 -0.50 -1.37 3
How can I do it?
I tried the below which didn't work:
new_df = df[df['index'] >= 1 & df['index'] <= 3]

Between min and max: use between():
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1,2,3], 'b':[11,12,13]})
>>> df
a b
0 1 11
1 2 12
2 3 13
>>> df[df.a.between(1,2)]
a b
0 1 11
1 2 12
Your attempt new_df = df[df['index'] >= 1 & df['index'] <= 3] is wrong in two places:
it's df.index, not df["index"]
when using multiple filters, use parentheses: df[(df.index >= 1) & (df.index <= 3)]

pandas grouping based on conditions on other columns

import pandas as pd
df = pd.DataFrame(columns=['A','B'])
df['A']=['A','B','A','A','B','B','B']
df['B']=[2,4,3,5,6,7,8]
df
A B
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
df.columns=['id','num']
df
id num
0 A 2
1 B 4
2 A 3
3 A 5
4 B 6
5 B 7
6 B 8
I would like to apply groupby on id column but some condition on num column
I want to have 2 columns is_even_count and is_odd_count columns in final data frame where is_even_count only counts even numbers from num column after grouping and is_odd_count only counts odd numbers from num column after grouping.
My output will be
is_even_count is_odd_count
A 1 2
B 3 1
how can i do this in pandas

Use modulo division by 2 and compare by 1 with map:
d = {True:'is_odd_count', False:'is_even_count'}
df = df.groupby(['id', (df['num'] % 2 == 1).map(d)]).size().unstack(fill_value=0)
print (df)
num is_even_count is_odd_count
id
A 1 2
B 3 1
Another solution with crosstab:
df = pd.crosstab(df['id'], (df['num'] % 2 == 1).map(d))
Alternative with numpy.where:
a = np.where(df['num'] % 2 == 1, 'is_odd_count', 'is_even_count')
df = pd.crosstab(df['id'], a)

Pandas dataframe rename column

I splited a dataframe into two parts and changed their column names seperately. Here's what I got:
df1 = df[df['colname'==0]]
df2 = df[df['colname'==1]]
df1.columns = [ 'a'+ x for x in df1.columns]
df2.columns = [ 'b'+ x for x in df2.columns]
And it turned out df2 has the columns start with 'ba' rather than 'b'. What happened?

I cannot simulate your problem, for me working nice.
Alternative solution should be add_prefix instead list comprehension:
df = pd.DataFrame({'colname':[0,1,0,0,0,1],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
C D E F colname
0 7 1 5 a 0
1 8 3 3 a 1
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
5 3 0 4 b 1
df1 = df[df['colname']==0].add_prefix('a')
df2 = df[df['colname']==1].add_prefix('b')
print (df1)
aC aD aE aF acolname
0 7 1 5 a 0
2 9 5 6 a 0
3 4 7 9 b 0
4 2 1 2 b 0
print (df2)
bC bD bE bF bcolname
1 8 3 3 a 1
5 3 0 4 b 1

Need to loop over pandas series to find indices of variable

I have a dataframe and a list. I would like to iterate over elements in the list and find their location in dataframe then store this to a new dataframe
my_list = ['1','2','3','4','5']
df1 = pd.DataFrame(my_list, columns=['Num'])
dataframe : df1
Num
0 1
1 2
2 3
3 4
4 5
dataframe : df2
0 1 2 3 4
0 9 12 8 6 7
1 11 1 4 10 13
2 5 14 2 0 3
I've tried something similar to this but doesn't work
for x in my_list:
i,j= np.array(np.where(df==x)).tolist()
df2['X'] = df.append(i)
df2['Y'] = df.append(j)
so looking for a result like this
dataframe : df1 updated
Num X Y
0 1 1 1
1 2 2 2
2 3 2 4
3 4 1 2
4 5 2 0
any hints or ideas would be appreciated

Instead of trying to find the value in df2, why not just make df2 a flat dataframe.
df2 = pd.melt(df2)
df2.reset_index(inplace=True)
df2.columns = ['X', 'Y', 'Num']
so now your df2 just looks like this:
Index X Y Num
0 0 0 9
1 1 0 11
2 2 0 5
3 3 1 12
4 4 1 1
5 5 1 14
You can of course sort by Num and if you just want the values from your list you can further filter df2:
df2 = df2[df2.Num.isin(my_list)]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

different substring for each row based on condition - pandas

Related

adding multiple lists into one column DataFrame pandas

How to select all the rows that are between a range of values in specific column in pandas dataframe

pandas grouping based on conditions on other columns

Pandas dataframe rename column

Need to loop over pandas series to find indices of variable

Categories

Resources