Removing double space and single space in data frame simultaneously - pandas

I have a column where the names are separated by Single space, double space(there can be more) and I want to split the names by Fist Name and Last Name
df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal',
'Roger Federer'],{'Age':[32,34,36]})
df['Name'] = df['Name'].str.strip()
df[['First_Name', 'Last_Name']] = df['Name'].str.split(" ",expand = True,)

this should do it
df[['First_Name', 'Last_Name']] = df.Name.apply(lambda x: pd.Series(list((filter(None, x.split(' '))))))

Use \s+ as your split pattern. This is the regex pattern meaning "one or more whitespace characters".
Also, limit number of splits with n=1. This means the string will only be split once (The first occurance of whitespace from left to right) - restricting the output to 2 columns.
df[['First_Name', 'Last_Name']] = df.Name.str.split('\s+', expand=True, n=1)
[out]
Name Age First_Name Last_Name
0 Steve Smith 32 Steve Smith
1 Joe Nadal 34 Joe Nadal
2 Roger Federer 36 Roger Federer

Related

Remove special characters from dataframe columns

How can I remove special characters in columns in a dataframe. eg
name
verified
id
Jason' Carly
True
1
Eunice, Banks
None
2
Expected result
name
verified
id
Jason Carly
True
1
Eunice Banks
None
2
Assuming you want to remove everything that is not a space or an ASCII letter, use a regex:
df['name'] = df['name'].str.replace(r'[^a-zA-Z ]', '', regex=True)
output:
name verified id
0 Jason Carly True 1
1 Eunice Banks None 2

Conditional merge and transformation of data in pandas

I have two data frames, and I want to create new columns in frame 1 using properties from frame 2
frame 1
Name
alice
bob
carol
frame 2
Name Type Value
alice lower 1
alice upper 2
bob equal 42
carol lower 0
desired result
frame 1
Name Lower Upper
alice 1 2
bob 42 42
carol 0 NA
Hence, the common column of both frames is Name. You can use Name to look up bounds (of a variable), which are specified in the second frame. Frame 1 lists each name exactly once. Frame 2 might have one or two entries per frame, which might either specify a lower or an upper bound (or both at a time if the type is equal). We do not need to have both bounds for each variable, one of the bounds can stay empty. I would like to have a frame that lists the range of each variable. I see how I can do that with for-loops over the columns, but that does not seem to be in the pandas spirit. Do you have any suggestions for a compact solution? :-)
Thanks in advance
You're not looking for a merge, but rather a pivot.
(df2[df2['Name'].isin(df1['Name'])]
.pivot('Name', 'Type', 'Value')
.reset_index()
)
But this doesn't handle the special 'equal' case.
For this, you can use a little trick. Replace 'equal' by a list with the other two values and explode to create the two rows.
(df2[df2['Name'].isin(df1['Name'])]
.assign(Type=lambda d: d['Type'].map(lambda x: {'equal': ['lower', 'upper']}.get(x,x)))
.explode('Type')
.pivot('Name', 'Type', 'Value')
.reset_index()
.convert_dtypes()
)
Output:
Name lower upper
0 alice 1 2
1 bob 42 42
2 carol 0 <NA>

Delete abbreviations (combination of Letter+dot) from Pandas column

I'd like to delete specific parts of strings in a pandas column, such as any letter followed by a dot. For example, having a column with names:
John W. Man
Betty J. Rule
C.S. Stuart
What should remain is
John Man
Betty Rule
Stuart
SO, any letter followed by a dot, that represents an abbreviation, should go.
I can't think of a way with str.replace or anything like that.
Use Series.str.replace with reegx for match one letter with . and space after it if exist:
df['col'] = df['col'].str.replace('([a-zA-Z]{1}\.\s*)','', regex=True)
print (df)
col
0 John Man
1 Betty Rule
2 Stuart

Need a way to split string pandas to colums with numbers

hi i have string in one column :
s='123. 125. 200.'
i want to split it to 3 columns(or as many numbers i have ends with .)
To separate columns and that it will be number not string !, in every column .
From what I understand, you can use:
s='123. 125. 200.'
pd.Series(s).str.rstrip('.').str.split('.',expand=True).apply(pd.to_numeric,errors='coerce')
0 1 2
0 123 125 200

Compare the above cell value and return alphabets

I am having two columns Last Name and First Name. Column Last Name contains real names and it may contain same names multiple times (if a name is repeating then it will be in the subsequent rows only and not elsewhere).
Requirement:
Now In the column First Name (its empty only), I need to capture alphabets based on the 'Last Name'. i.e. i am expecting the first name column to be filled with A || B || C || D|| if a last name contains 4 times. e.g.:
Lastname Firstname
SMITH A
SMITH B
Conte A
Conte B
Watts A
Watts B
Speirs A
Speirs B
CONNOLLY A
Austin A
Austin B
Austin C
Austin D
Austin E
Austin F
Austin G
=CHAR(COUNTIF(A$1:A1,A2)+65)
to be entered in B2 and pulled down.
Instead of multiple nested IF's, given a sorted list, you could try:
=IF(A2<>A1,"A",CHAR(CODE(B1)+1))