Modifying Dataframe column value - pandas

I have my pandas dataframe contain data in the following format:
SAC1001.K
KAM10120.B01.W001
CLT004.09C
ASMA104
AJAY101.A.KAS.101
I wish to modify the column using string manipulation so, that the result is
SAC1001.K
KAM10120.B01
CLT004.09C
ASMA104
AJAY101.A
How this can be done? Regex looks to be one way but, not sure of it. Any other elegant way to do it? Please guide

In [109]: df
Out[109]:
col
0 SAC1001.K
1 KAM10120.B01.W001
2 CLT004.09C
3 ASMA104
4 AJAY101.A.KAS.101
In [110]: df['col'] = df['col'].str.replace(r'(\..*?)\..*', r'\1')
In [111]: df
Out[111]:
col
0 SAC1001.K
1 KAM10120.B01
2 CLT004.09C
3 ASMA104
4 AJAY101.A

Here is another way without regex but maybe with too many str
df['col'].str.split('.').str[:2].str.join('.')

Related

Convert transactions with several products from columns to row [duplicate]

I'm having a very tough time trying to figure out how to do this with python. I have the following table:
NAMES VALUE
john_1 1
john_2 2
john_3 3
bro_1 4
bro_2 5
bro_3 6
guy_1 7
guy_2 8
guy_3 9
And I would like to go to:
NAMES VALUE1 VALUE2 VALUE3
john 1 2 3
bro 4 5 6
guy 7 8 9
I have tried with pandas, so I first split the index (NAMES) and I can create the new columns but I have trouble indexing the values to the right column.
Can someone at least give me a direction where the solution to this problem is? I don't expect a full code (I know that this is not appreciated) but any help is welcome.
After splitting the NAMES column, use .pivot to reshape your DataFrame.
# Split Names and Pivot.
df['NAME_NBR'] = df['NAMES'].str.split('_').str.get(1)
df['NAMES'] = df['NAMES'].str.split('_').str.get(0)
df = df.pivot(index='NAMES', columns='NAME_NBR', values='VALUE')
# Rename columns and reset the index.
df.columns = ['VALUE{}'.format(c) for c in df.columns]
df.reset_index(inplace=True)
If you want to be slick, you can do the split in a single line:
df['NAMES'], df['NAME_NBR'] = zip(*[s.split('_') for s in df['NAMES']])

DataFrame column contains characters that are not numbers - how to convert to integer?

I have the following Problem: I have a dataframe that looks like this:
A B
0 1 [5]
1 3 [1]
2 3 [118]
3 5 [34]
Now, I Need column B to only contain numbers, otherwise I can't work with the data. I already tried to use the replace-function and simply replace "[]" with "", but that didn't work out.
Is there any other way? Maybe I can convert the whole column to only keep the numbers as integers? That would be even better than just dropping the parenthesis.
I'm grateful for any help, I've been stuck with this for 2h now.
If your B column contains a string, use:
df['B'] = df['B'].str[1:-1].astype(int)
If your B column contains a list of one element, use:
df['B'] = df['B'].str[0]
Update
df['B'] = df['B'].str.extract(r'\[(.*)\]', expand=False).astype(int)

Delete all rows with an empty cell anywhere in the table at once in pandas

I have googled it and found lots of question in stackoverflow. So suppose I have a dataframe like this
A B
-----
1
2
4 4
First 3 rows will be deleted. And suppose I have not 2 but 200 columns. How can I do that?
As per your request - first replace to Nan:
df = df.replace(r'^\s*$', np.nan, regex=True)
df = df.dropna()
If you want to remove on a specific column, then you need to specify the column name in the brackets

Pandas dataframe row data filtering

I have a column of data in pandas dataframe in Bxxxx-xx-xx-xx.y format. Only the first part (Bxxxx) is all I require. How do I split the data? In addition, I also have data in BSxxxx-xx-xx-xx format in the same column which I would like to remove using regex='^BS' command (For some reason, it's not working). Any help in this regard will be appreciated.BTW, I am using df.filter command.
This should work.
df[df.col1.apply(lambda x: x.split("-")[0][0:2]!="BS")].col1.apply(lambda x: x.split("-")[0])
Consider below example:
df = pd.DataFrame({
'col':['B123-34-gd-op','BS01010-9090-00s00','B000003-3frdef4-gdi-ortp','B1263423-304-gdcd-op','Bfoo3-poo-plld-opo', 'BSfewf-sfdsd-cvc']
})
print(df)
Output:
col
0 B123-34-gd-op
1 BS01010-9090-00s00
2 B000003-3frdef4-gdi-ortp
3 B1263423-304-gdcd-op
4 Bfoo3-poo-plld-opo
5 BSfewf-sfdsd-cvc
Now Let's do two tasks:
Extract Bxxxx part from Bxxx-xx-xx-xxx .
Remove BSxxx formated strings.
Consider below code which uses startswith():
df[~df.col.str.startswith('BS')].col.str.split('-').str[0]
Output:
0 B123
2 B000003
3 B1263423
4 Bfoo3
Name: col, dtype: object
Breakdown:
df[~df.col.str.startswith('BS')] gives us all the string which do not start with BS. Next, We are spliting those string with - and taking the first part with .col.str.split('-').str[0] .
You can define a function where in you treat Bxxxx-xx-xx-xx.y as a string and just extract the first 5 indexes.
>>> def edit_entry(x):
... return (str(x)[:5])
>>> df['Column_name'].apply(edit_entry)
A one-liner solution would be:
df["column_name"] = df["column_name"].apply(lambda x: x[:5])

Aggregate over an index in pandas?

How can I aggregate (sum) over an index which I intend to map to new values? Basically I have a groupby result by two variables where I want to groupby one variable into larger classes. The following code does this operation on s by mapping the first by-variable but seems too complicating:
import pandas as pd
mapping={1:1, 2:1, 3:3}
s=pd.Series([1]*6, index=pd.MultiIndex.from_arrays([[1,1,2,2,3,3],[1,2,1,2,1,2]]))
x=s.reset_index()
x["level_0"]=x.level_0.map(mapping)
result=x.groupby(["level_0", "level_1"])[0].sum()
Is there a way to write this more concisely?
There is a level= option for Series.sum(), I guess you can use that and it will be a quite concise way to do it.
In [69]:
s.index = pd.MultiIndex.from_tuples(map(lambda x: (mapping.get(x[0]), x[1]), s.index.values))
s.sum(level=(0,1))
Out[69]:
1 1 2
2 2
3 1 1
2 1
dtype: int64