How to set missing value in dataframe.csv(sys.stdout, na_rep='NULL') for the first row of dataframe containing column headers(the highlighted portion) - pandas

Set missing value
Tried data.to_csv(sys.stdout,na_rep='NULL') but it doesnt apply to first row

That's your index's name - it cannot be ignored.
Set index name instead using df.index.name = 'yourname' or remove the index from the to_csv using df.to_csv(index=False)

Related

Python : How to create a new boolean column in my dataframe if the value of another column is in a list

I have a dataframe and I want to create a new column which take the value 1 if the value of an other column is in a list and 0 else. I try this but it did not work. Thank you

is there a way to give column names to pd.read_clipboard() as it is treating first row of data as column names

I am using pd.read_clipboard() function to get an excel table that doesnt have column names as first row . The dataframe returned has first row as column labels. How to fix that.
I would like results to be
and not this
Though not showing up on help for read_clipboard() function , passing read_clipboard(names=['c1','c2']) where c1 and c2 are the column names fixes the read_clipboard() function to not treat first row as column names i.e provide column names to avoid having the function treat first row as column names

Openrefine add Counter to value

I'm using a text facet to get only rows that include a certain value. With the resulting rows I'd like to fill down a column with values from another column. This is how I'm doing:
cells["Auto_Objektkennung"].value
How could I add a continuing number to every value starting with 0?
Pseudo Code:
cells["Auto_Objektkennung"].value + '-' + COUNTER+1
Unfortunately, the row index does not help as due to the text facet I'm not starting with one but somewhere around 8000
cells["Auto_Objektkennung"].value + '-' + row.record.index
Here is a manual way to achieve this with OpenRefine and GREL without delegating the task to Clojure or Jython.
Idea: We can first create records based on a text facet or text filter.
Then we can use row.record.index to create the expected "continuing number[s]".
Recipe:
With your text facet (or filter) active, add a new column named "record_marker".
Move the new column "record_marker" to the beginning.
Add a new column "counter" using the expression row.record.index - 1.
Blank down the new "counter" column.
You can now use the "counter" column in your expression.
if(cells["counter"].value >= 0, cells["Auto_Objektkennung"].value + "-" + cells["counter"].value, "")
Clean up by deleting the "record_marker" and "counter" columns.

how to identify first record in a column pysaprk

I have a dataframe with many columns
So I have to identigy first record of a column and assign it one value and for others assign another value
i.e
if df[price].first_record = df[amt]
else
df[price] = df[amt]+df[delivery_charges]
how do I identify the first record in a column/dataframe
You can do this in following way:
window = Window.orderBy('Id')
df.withColumn('row',f.row_number().over(window)).withColumn('price',f.when(f.col('row')==1,f.col('amt')).otherwise(f.col('amt')+f.col('delivery_charges'))).show()

removing double quotes from column header in pandas dataframe

I am trying to understand the intuition behind the following line of code.
I know that it is removing double quotes from the column header within a dataframe, although can anyone please help me understand how it is doing that?
df.columns = [col[1:-1] for col in df.columns]
Thanks
df.columns = ... is the part of line that assigns the list on the right hand side to the columns.
Then the right hand side is a list comprehension, meaning it can be understood like a for loop.
Then, in python, a string is an array of characters. for col in columns mean you iterate over each string in the list of columns. Each col is an array. If that string has quotes, then it looks like "xxxx". So the first and the last elements of the array are quotes.
col[1:-1] is the way to slice the array from the second element to the one before last.
So when you put all these things next to each other, in your case you end up removing quotes.
This is to take 2nd index to 2nd last index in a column names string for each column
Example:-
If column name string is of size less than 3, then it gives you empty string as column name
Example -