Python : How to create a new boolean column in my dataframe if the value of another column is in a list - dataframe

I have a dataframe and I want to create a new column which take the value 1 if the value of an other column is in a list and 0 else. I try this but it did not work. Thank you

Related

How to set missing value in dataframe.csv(sys.stdout, na_rep='NULL') for the first row of dataframe containing column headers(the highlighted portion)

Set missing value
Tried data.to_csv(sys.stdout,na_rep='NULL') but it doesnt apply to first row
That's your index's name - it cannot be ignored.
Set index name instead using df.index.name = 'yourname' or remove the index from the to_csv using df.to_csv(index=False)

is there a way to give column names to pd.read_clipboard() as it is treating first row of data as column names

I am using pd.read_clipboard() function to get an excel table that doesnt have column names as first row . The dataframe returned has first row as column labels. How to fix that.
I would like results to be
and not this
Though not showing up on help for read_clipboard() function , passing read_clipboard(names=['c1','c2']) where c1 and c2 are the column names fixes the read_clipboard() function to not treat first row as column names i.e provide column names to avoid having the function treat first row as column names

How do you create df column using str.contains to assign rows by category?

I want to add a column to a df and I want the values for the rows of that column to be a category based on strings contained in another attribute in the data.
For example, if another attribute contains a "P" or "S" I want the string within the new df column to say cat1. But if the other attribute contains an "O" I want the string in the df column to say cat2. Otherwise, I want the string to be cat3.
So essentially, I want to create a df column that considers what the string in another df column is and then assigns the row value based on what that string contains.

how to identify first record in a column pysaprk

I have a dataframe with many columns
So I have to identigy first record of a column and assign it one value and for others assign another value
i.e
if df[price].first_record = df[amt]
else
df[price] = df[amt]+df[delivery_charges]
how do I identify the first record in a column/dataframe
You can do this in following way:
window = Window.orderBy('Id')
df.withColumn('row',f.row_number().over(window)).withColumn('price',f.when(f.col('row')==1,f.col('amt')).otherwise(f.col('amt')+f.col('delivery_charges'))).show()

add numbers down a column in OpenRefine

I'd like to automatically number a column. Similar to Excel, where I can type "1" in one cell and the cells below it automatically get numbered 2, 3, 4, 5, etc. I don't know why I'm having so much trouble figuring out this function on Openrefine but any help would be greatly appreciated.
Thanks,
Gail
You can add a new column ("Add new column based on this column") with this Grel formula inside :
row.index + 1
The answer by Ettore Rizza already provides a solution for the common case. As the question author stated in a comment it does not work for his use case. He wants to add consecutive numbers to unfiltered rows.
For this you can use records. The basic idea is to create records from the filtered data and use the record index as counter.
Steps:
With filters active add a new column with the expression value.
Move the new column to the beginning to use it as records.
With filters still active add a new column (or transform the first one) with the expression row.record.index + 1.
Original
Filtered
Records
Index
A
A
A
1
1
2
B
B
B
2
C
C
C
3