Is there a way to selectively replace content in a dataframe? - pandas

df.replace(['《', '》', ' ,\t'], ['', '', '|'], regex=True, inplace=True)
I have a dataFrame in which I want to replace the 3 characters. But these replace all the columns. Can I exclude a particular column in the above code that won't be replaced? For example, I have a column called 'Summary' that I don't want to apply these replacements.
Is that possible?

Assuming you only want to exclude df['Summary], you can use .loc to exclude all columns but one and then replace the strings you need to replace.
I dropped the inplace=True because I don't know if that will work.
df.loc[:, df.columns != 'Summary'] = df.loc[:, df.columns != 'Summary'].replace(['《','》',' ,\t'],['','','|'], regex=True)

Try with pass dict
df.update(df.drop('Summary',1).replace(dict(zip(['《', '》', ' ,\t'], ['', '', '|'])), regex=True))

Related

How to replace element in pandas DataFrame column [duplicate]

I have a column in my dataframe like this:
range
"(2,30)"
"(50,290)"
"(400,1000)"
...
and I want to replace the , comma with - dash. I'm currently using this method but nothing is changed.
org_info_exc['range'].replace(',', '-', inplace=True)
Can anybody help?
Use the vectorised str method replace:
df['range'] = df['range'].str.replace(',','-')
df
range
0 (2-30)
1 (50-290)
EDIT: so if we look at what you tried and why it didn't work:
df['range'].replace(',','-',inplace=True)
from the docs we see this description:
str or regex: str: string exactly matching to_replace will be replaced
with value
So because the str values do not match, no replacement occurs, compare with the following:
df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
0 (2,30)
1 -
Name: range, dtype: object
here we get an exact match on the second row and the replacement occurs.
For anyone else arriving here from Google search on how to do a string replacement on all columns (for example, if one has multiple columns like the OP's 'range' column):
Pandas has a built in replace method available on a dataframe object.
df.replace(',', '-', regex=True)
Source: Docs
If you only need to replace characters in one specific column, somehow regex=True and in place=True all failed, I think this way will work:
data["column_name"] = data["column_name"].apply(lambda x: x.replace("characters_need_to_replace", "new_characters"))
lambda is more like a function that works like a for loop in this scenario.
x here represents every one of the entries in the current column.
The only thing you need to do is to change the "column_name", "characters_need_to_replace" and "new_characters".
Replace all commas with underscore in the column names
data.columns= data.columns.str.replace(' ','_',regex=True)
In addition, for those looking to replace more than one character in a column, you can do it using regular expressions:
import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'
df['string_col'].str.replace(regular_expression, '', regex=True)
Almost similar to the answer by Nancy K, this works for me:
data["column_name"] = data["column_name"].apply(lambda x: x.str.replace("characters_need_to_replace", "new_characters"))
If you want to remove two or more elements from a string, example the characters '$' and ',' :
Column_Name
===========
$100,000
$1,100,000
... then use:
data.Column_Name.str.replace("[$,]", "", regex=True)
=> [ 100000, 1100000 ]

I tried to add filter for the Countries but it gives me an eeror [duplicate]

I have a dataframe with a lot of columns in it. Now I want to select only certain columns. I have saved all the names of the columns that I want to select into a Python list and now I want to filter my dataframe according to this list.
I've been trying to do:
df_new = df[[list]]
where list includes all the column names that I want to select.
However I get the error:
TypeError: unhashable type: 'list'
Any help on this one?
You can remove one []:
df_new = df[list]
Also better is use other name as list, e.g. L:
df_new = df[L]
It look like working, I try only simplify it:
L = []
for x in df.columns:
if not "_" in x[-3:]:
L.append(x)
print (L)
List comprehension:
print ([x for x in df.columns if not "_" in x[-3:]])

How to remove part of the column name?

I have a DataFrame with several columns like:
'clientes_enderecos.CEP', 'tabela_clientes.RENDA','tabela_produtos.cod_ramo', 'tabela_qar.chave', etc
I want to change the name of the columns and remove all the text neighbord a dot.
I only know the method pandas.rename({'A':'a','B':'b'})
But to name them as they are now I used:
df_tabela_clientes.columns = ["tabela_clientes." + str(col) for col in df_tabela_clientes.columns]
How could I reverse the process?
Try rename with lambda and string manipulation:
df = pd.DataFrame(columns=['clientes_enderecos.CEP', 'tabela_clientes.RENDA','tabela_produtos.cod_ramo', 'tabela_qar.chave'])
print(df)
#Empty DataFrame
#Columns: [clientes_enderecos.CEP, tabela_clientes.RENDA, tabela_produtos.cod_ramo, #tabela_qar.chave]
#Index: []
dfc = df.rename(columns=lambda x: x.split('.')[-1])
print(dfc)
#Empty DataFrame
#Columns: [CEP, RENDA, cod_ramo, chave]
#Index: []
To get rid of whats to the right of the dot you can split the columns names and choose whichever side of the dot you want.
import pandas as pd
df = pd.DataFrame(columns=['clientes_enderecos.CEP', 'tabela_clientes.RENDA','tabela_produtos.cod_ramo', 'tabela_qar.chave'])
df.columns = [name.split('.')[0] for name in df.columns] # 0: before the dot | 1:after the dot

Drop Rows if value of subset is [] from dtype Object

my DataFrame looks like that:
DataFrame
I found this but it didn´t work for me.
Drop rows containing empty cells from a pandas DataFrame
i tried
df_model['Zustand'].replace("[]", np.nan, inplace=True)
df_model.dropna(subset=['Zustand'], inplace=True)
"[]" is dtype Object. I am not sure how to deal with it.
Greets
Daniele
If the column contains 'list' objects:
df_model['Zustand'].apply(lambda x: np.nan if len(x)==0 else x)
df_model.dropna(subset=['Zustand'], inplace=True)
Let me use own data now that you didnt give me some
df_model=pd.DataFrame({'BildName':['D850', 'D876', 'D839','ETC'], 'Zustand':['Beto','kelo','[]','[]']})
Try
df_model['Zustand']=df_model['Zustand'].str.replace("\[\]", "")
df_model['Zustand'].replace('', np.nan, inplace=True)
df_model.dropna(subset=['Zustand'], inplace=True)
or
apply mask
m=df_model.Zustand.str.contains('\[\]')
Change as you wanted
df_model.loc[m,'Zustand']=np.nan
df_model.dropna(inplace=True)

Pandas - string replace but element is a list of strings

This line lets me replace the substring /data/ in every row of the column path with "../datasets/"
df['path']=df['path'].astype(str).str.replace("/data/","../datasets/")
What if every row of the column path contains a list of strings e.g. ["/data/1","/data/2"] ? How can I use replace?
for example df['path'][0] should go from ["/data/1","/data/2"] to ["../datasets/1","../datasets/2"]
Use apply:
df = pd.DataFrame({
'path': [["/data/1","/data/2"]]
})
df['path'] = df['path'].apply(lambda lst: [s.replace('/data/', '../datasets/') for s in lst])