I have a dataframe that looks like this:
col1 col2
Yes 23123
No 23423423
Yes 34234
No 13213
I want to replace values in col2 so that if 'Yes' in col1 then return blank and if 'No' return the initial value
I want to see this:
col1 col2
Yes
No 23423423
Yes
No 13213
I have tried this but 'No' is returning None:
def map_value(x):
if x in ['Yes']:
return ''
else:
return None
df['col2'] = df['col1'].apply(map_value)
there are many ways to go about this, one of them is
df.loc[df.col1 == 'Yes', 'col2'] = ''
Output:
col1 col2
Yes
No 23423423
Yes
No 13213
You can use numpy for this
import pandas as pd
import numpy as np
d = {'col1': ['yes', 'no', 'yes', 'no'], 'col2': [23123,23423423,34234,13213]}
df = pd.DataFrame(data=d)
df['col2'] = np.where(df.col1 == 'yes', '', df.col2)
df
Created df by copying sample data from OP's post and using following command:
df=pd.read_clipboard();
df
col1 col2
0 Yes 23123
1 No 23423423
2 Yes 34234
3 No 13213
Could you please try following.
m=df['col1']=='No'
df['col2']=df['col2'].where(m,'')
df
After running code output will be as follows:
col1 col2
0 Yes
1 No 23423423
2 Yes
3 No 13213
Related
I have this python dictionary:
dictionary = {
'1':'A',
'2':'B',
'3':'C',
'4':'D',
'5':'E',
'6':'F',
'7':'G',
'8':'H',
'8':'I',
'9':'J',
'0':'L'
}
The I have created this simple pandas dataframe:
import pandas as pd
ds = {'col1' : [12345,67890], 'col2' : [12364,78910]}
df = pd.DataFrame(data=ds)
print(df)
Which looks like this:
col1 col2
0 12345 12364
1 67890 78910
I would like to transform each and every digit in col1 (which is an int field) to the correspondent letter as per dictionary indicated above. So, basically I'd like the resulting dataframe to look like this:
col1 col2 col1_transformed
0 12345 12364 ABCDE
1 67890 78910 FGHIJ
Is there a quick, pythonic way to do so by any chance?
A possible solution (notice that 8 is repeated in your dictionary -- a typo? -- and, therefore, my result does not match yours):
def f(x):
return ''.join([dictionary[y] for y in str(x)])
df['col3'] = df['col1'].map(f)
Output:
col1 col2 col3
0 12345 12364 ABCDE
1 67890 78910 FGIJL
Try:
df[df.columns + "_transformed"] = df.apply(
lambda x: [
"".join(dictionary.get(ch, "") for ch in s) for s in map(str, x)
],
axis=1,
result_type="expand",
)
print(df)
Prints:
col1 col2 col1_transformed col2_transformed
0 12345 12364 ABCDE ABCFD
1 67890 78910 FGIJL GIJAL
I have 2 dataframes like these:
df1 = pd.DataFrame(data = {'col1' : ['finance', 'accounting'], 'col2' : ['f1', 'a1']})
df2 = pd.DataFrame(data = {'col1' : ['finance', 'finance', 'finance', 'accounting', 'accounting','IT','IT'], 'col2' : ['f1','f2','f3','a1,'a2','I1','I2']})
df1
col1 col2
0 finance f1
1 accounting a1
df2
col1 col2
0 finance f1
1 finance f2
2 finance f3
3 accounting a1
4 accounting a2
5 IT I1
6 IT I2
I would like to do LEFT JOIN on col1 and ANTI-JOIN on col2. The output should look like this:
col1 col2
finance f2
finance f3
accounting a2
Could someone please help me how to do it properly in pandas. I tried both join and merge in pandas but it hasn't worked for me. Thanks in advance.
You can merge and filter:
(df1.merge(df2, on='col1', suffixes=('_', None))
.loc[lambda d: d['col2'] != d.pop('col2_')]
)
Output:
col1 col2
1 finance f2
2 finance f3
4 accounting a2
Just for fun, here's another way (other than the really elegant solution by #mozway):
df2 = ( df2
.reset_index() # save index as column 'index'
.set_index('col1') # make 'col1' the index
.loc[df1.col1,:] # filter for 'col1' values in df1
.set_index('col2', append=True) # add 'col2' to the index
.drop(index=df1.
set_index(list(df1.columns))
.index) # create a multi-index from df1 and drop all matches from df2
.reset_index() # make 'col1' and col2' columns again
.set_index('index') # make 'index' the index again
.rename_axis(index=None) ) # make the index anonymous
Output:
col1 col2
1 finance f2
2 finance f3
4 accounting a2
I have an existing data frame with known columns. I want to insert a row with data for each column inserted one at a time.
I first created an empty data frame with few columns-
df = pd.DataFrame(columns=['col1', 'col2', 'col3'])
df.to_csv('test.csv', sep='|', index=False)
test.csv
col1|col2|col3
Then, added a row with data inserted for each column one at a time.
list = ['col1', 'col2', 'col3']
turn = 2
df = pd.read_csv('test.csv', sep='|')
while turn:
for each in list:
df[each] = turn
turn-=1
Expected output test.csv
col1|col2|col3
2 |2 |2
1 |1 |1
But I am unable to get the expected output, instead, I'm getting this
col1|col2|col3
Kindly let me know where I'm making mistake, I would really appreciate any sort of help.
You can use df.append() to append a row
import pandas as pd
df = pd.DataFrame(columns=['col1', 'col2', 'col3'])
turn = 2
while turn:
new_row = {'col1':turn, 'col2':turn, 'col3':turn}
df = df.append(new_row, ignore_index=True)
turn-=1
Out[11]:
col1 col2 col3
0 2 2 2
1 1 1 1
To modify your while loop, do:
turn = 2
while turn:
for each in list:
df.loc[len(df.dropna()), each] = turn
turn-=1
>>> df
col1 col2 col3
0 2 2 2
1 1 1 1
>>>
The reason it doesn't work is because you're assigning to the whole column... Not the specific row value.
i have a df with lists.I'm trying to remove brackets from a list.
df
a
0 ['a','b']
1 ['a']
2 ['a','b','c']
3 []
Expected output:
a
0 'a','b'
1 'a'
2 'a','b','c'
3
Here is what you can do:
import pandas as pd
df = pd.DataFrame({'a':[['a','b'],
['a'],
['a','b','c'],
[]]})
df['a'] = ["'"+"', '".join(n)+"'" for n in df['a']]
print(df)
Output:
a
0 'a', 'b'
1 'a'
2 'a', 'b', 'c'
3 ''
You can do:
df['a'] = ("'"+df['a'].agg("','".join)+"'").replace("''", "")
Output:
a
0 'a','b'
1 'a','b','c'
2
Another way
m=df['a'].str.len()>0#Boolean select non empty lists
df.loc[m,'a']= ["'"+ item +"'" for item in df.loc[m,'a'].str.join("','") ]#Mask the non empty lists, strip corner bracket and insert inverted commas
df=df.mask(df.applymap(str).eq('[]'))# convert empty list into a a NaN
#df.loc[~m,'a']=''
print(df)
a
0 'a','b'
1 'a'
2 'a','b','c'
3 NaN
I'm trying to substring a column based on the length of another column but the resultset is NaN. What am I doing wrong?
import pandas as pd
df = pd.DataFrame([['abcdefghi','xyz'], ['abcdefghi', 'z']], columns=['col1', 'col2'])
df.col1.str[:df.col2.str.len()]
0 NaN
1 NaN
Name: col1, dtype: float64
Here is what I am expecting:
0 'abc'
1 'a'
I don't think string indexing would take a series. I would do a list comprehension:
df['extract'] = [r.col1[:len(r.col2)] for _,r in df.iterrows()]
Or
df['extract'] = [s1[:len(s2)] for s1,s2 in zip(df.col1, df.col2)]
Output:
col1 col2 extract
0 abcdefghi xyz abc
1 abcdefghi z a
using numpy and converting the array to pd.Series
def slicer(start=None, stop=None, step=1):
return np.vectorize(lambda x: x[start:stop:step], otypes=[str])
df["new_str"] = pd.Series(
[slicer(0, i)(c) for i, c in zip(df["col2"].apply(len), df["col1"].values)]
)
print(df)
col1 col2 new_str
0 abcdefghi xyz abc
1 abcdefghi z a
Here is a solution using lambda:
df['new'] = df.apply(lambda row: row['col1'][0:len(row['col2'])], axis=1)
Result:
col1 col2 new
0 abcdefghi xyz abc
1 abcdefghi z a