how do I split 1 str column into 2 columns in a pandas dataframe [duplicate] - pandas

This question already has answers here:
How to split a dataframe string column into two columns?
(11 answers)
Closed 2 years ago.
enter image description here
df['business location'] #column i want to split into 2 columns :
df['longitude'] #and
df['latitude']
df[['longitude','latitude']] = sf['Business Location'].str.split(',')
is giving me error:
ValueError: Must have equal len keys and value when setting with an iterable
how do I split?

This will work
df.assign(
longitude=lambda x: x['Business Location'].str.split(',').str[0],
latitude=lambda x: x['Business Location'].str.split(',').str[1])

Related

Create a new column with value 1/0 based on other column value in pandas [duplicate]

This question already has answers here:
Adding a new pandas column with mapped value from a dictionary [duplicate]
(1 answer)
Pandas conditional creation of a series/dataframe column
(13 answers)
Mapping values in place (for example with Gender) from string to int in Pandas dataframe [duplicate]
(3 answers)
Closed 9 hours ago.
I want to create a column with values 1 for female, 0 for male based on the gender column in Pandas.
Is using a for loop efficient?

split number based df of one column into 2 columns based on white space [duplicate]

This question already has answers here:
Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries
(3 answers)
How to split a dataframe string column into two columns?
(11 answers)
Closed 4 months ago.
According to the docs https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html, I want to split this one column of numbers into 2 columns based on default whitespace. However, the following doesnt appear to do anything.
self.data[0].str.split(expand=True)
The df is of shape (1,1) but would like to split into (1,2)
Output
0
0 1.28353e-02 3.24985e-02
Desired Output
0 1
0 1.28353e-02 3.24985e-02
PS: I dont want to specifically create columns A and B

How to replace character into multiIndex pandas [duplicate]

This question already has an answer here:
Pandas dataframe replace string in multiple columns by finding substring
(1 answer)
Closed 11 months ago.
I have a dataset with severals columns containing numbers and I need to remove the ',' thousand separator.
Here is an example: 123,456.15 -> 123456.15.
I tried to get it done with multi-indexes the following way:
toProcess = ['col1','col2','col3']
df[toProcess] = df[toProcess].str.replace(',','')
Unfortunately, the error is: 'Dataframe' object has no attributes 'str'. Dataframe don't have str attributes but Series does.
How can I achieve this task efficiently ?
Here is a working way iterating over the columns:
toProcess = ['col1','col2','col3']
for i, col in enumerate(toProcess):
df[col] = df[col].str.replace(',','')
Use:
df[toProcess] = df[toProcess].replace(',','', regex=True)

What is the use of lit() in spark? The below two piece of code returns the same output, what is the benefit of using lit() [duplicate]

This question already has answers here:
Where do you need to use lit() in Pyspark SQL?
(2 answers)
Closed 2 years ago.
I have two piece of codes here
gooddata = gooddata.withColumn("Priority",when(gooddata.years_left < 5 & (gooddata.Years_left >= 0),lit("CRITICAL"))).fillna("LOW").show(5)
gooddata=gooddata.withColumn("Priority",when((gooddata.Years_left < 5) & (gooddata.Years_left >= 0),"CRITICAL").otherwise("LOW")).show(5)
For both spark and pyspark:
literals in certain statements
comparing with nulls
getting the name of a dataframe column instead of the contents of the dataframe column
E.g.
val nonNulls = df.columns.map(x => when(col(x).isNotNull, concat(lit(","), lit(x))).otherwise(",")).reduce(concat(_, _))
from question: Add a column to spark dataframe which contains list of all column names of the current row whose value is not null
val df2 = df.select(col("EmpId"),col("Salary"),lit("1").as("lit_value1"))

how to write list comprehension for selecting cells base on a substring [duplicate]

This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
I am trying to rewrite the following in one line using list comprehension. I want to select cells that contains substring '[edit]' only. ut is my dataframe and the column that I want to select from is 'col1'. Thanks!
for u in ut['col1']:
if '[edit]' in u:
print(u)
I expect the following output:
Alabama[edit]
Alaska[edit]
Arizona[edit]
...
If the output of a Pandas Series is acceptable, then you can just use .str.contains, without a loop
s = ut[ut["col1"].str.contains("edit")]
If you need to print each element of the Series separately, then loop over the Series using
for i in s:
print(i)