This question already has an answer here:
Pandas dataframe replace string in multiple columns by finding substring
(1 answer)
Closed 11 months ago.
I have a dataset with severals columns containing numbers and I need to remove the ',' thousand separator.
Here is an example: 123,456.15 -> 123456.15.
I tried to get it done with multi-indexes the following way:
toProcess = ['col1','col2','col3']
df[toProcess] = df[toProcess].str.replace(',','')
Unfortunately, the error is: 'Dataframe' object has no attributes 'str'. Dataframe don't have str attributes but Series does.
How can I achieve this task efficiently ?
Here is a working way iterating over the columns:
toProcess = ['col1','col2','col3']
for i, col in enumerate(toProcess):
df[col] = df[col].str.replace(',','')
Use:
df[toProcess] = df[toProcess].replace(',','', regex=True)
Related
This question already has answers here:
Pandas dataframe with multiindex column - merge levels
(4 answers)
Closed 2 months ago.
I want to set all column names at one line. How should I do it?
I tried many things but couldn't do it, including renaming columns.
You need to flatten your multiindex column header.
df.columns = df.columns.map('_'.join)
Or using f-string with list comprehension:
df.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df.columns]
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 months ago.
I am new to writing code and currently working on a project to compare two columns of an excel sheet using python and return the rows that does not match.
I tried using the .isin funtion and was able to identify output the values comparing the columns however i am not sure on how to print the actual row that returns the value "False"
For Example:
import pandas as pd
data = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram","Elise Dolton"]
df = pd.DataFrame(data, columns=['Names'])
df
data1 = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram"]
df1 = pd.DataFrame(data1, columns=['Names'])
df1
data_compare = df["Names"].isin(df1["Names"])
for data in data_compare:
if data==False:
print(data)
However, i want to know that 8 index returned False, something like the below format
Could you please advise how i can modify the code to get the output printed with the Index, Name that returned False?
This question already has answers here:
How to split a dataframe string column into two columns?
(11 answers)
Closed 2 years ago.
enter image description here
df['business location'] #column i want to split into 2 columns :
df['longitude'] #and
df['latitude']
df[['longitude','latitude']] = sf['Business Location'].str.split(',')
is giving me error:
ValueError: Must have equal len keys and value when setting with an iterable
how do I split?
This will work
df.assign(
longitude=lambda x: x['Business Location'].str.split(',').str[0],
latitude=lambda x: x['Business Location'].str.split(',').str[1])
This question already has answers here:
Where do you need to use lit() in Pyspark SQL?
(2 answers)
Closed 2 years ago.
I have two piece of codes here
gooddata = gooddata.withColumn("Priority",when(gooddata.years_left < 5 & (gooddata.Years_left >= 0),lit("CRITICAL"))).fillna("LOW").show(5)
gooddata=gooddata.withColumn("Priority",when((gooddata.Years_left < 5) & (gooddata.Years_left >= 0),"CRITICAL").otherwise("LOW")).show(5)
For both spark and pyspark:
literals in certain statements
comparing with nulls
getting the name of a dataframe column instead of the contents of the dataframe column
E.g.
val nonNulls = df.columns.map(x => when(col(x).isNotNull, concat(lit(","), lit(x))).otherwise(",")).reduce(concat(_, _))
from question: Add a column to spark dataframe which contains list of all column names of the current row whose value is not null
val df2 = df.select(col("EmpId"),col("Salary"),lit("1").as("lit_value1"))
This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
I am trying to rewrite the following in one line using list comprehension. I want to select cells that contains substring '[edit]' only. ut is my dataframe and the column that I want to select from is 'col1'. Thanks!
for u in ut['col1']:
if '[edit]' in u:
print(u)
I expect the following output:
Alabama[edit]
Alaska[edit]
Arizona[edit]
...
If the output of a Pandas Series is acceptable, then you can just use .str.contains, without a loop
s = ut[ut["col1"].str.contains("edit")]
If you need to print each element of the Series separately, then loop over the Series using
for i in s:
print(i)