Pandas: printing to csv also creates an unlabelled column with row indices - pandas

I use the following code to print a dataframe into a csv file based on the answers seen at this question:
How to avoid Python/Pandas creating an index in a saved csv?
y_insincere = [classify_text(text,trimmed_posterior_dict)<0 for text in X_test]
X_output = pd.DataFrame()
X_output['number'] = number
X_output['CASE'] = case
X_output.to_csv('submission.csv',header=True,columns = ['id','case'],index='False')
However, when I look at the csv file it has an extra column without a header with row indices. I tried other fixes from the above question, but nothing worked. I am stuck. Any help appreciated

Related

Error in appending first row of a dataframe to a csv file template

Trying to append the dataframe to an existing template of a csv file.
Facing problem with the first row data only. Instead of first row being written in second row below the column fields, it is getting written adjacent to column headings with the last column heading getting merged with the first row data. Unable to resolve it. I'm seeking help to resolve this issue.
import shutil, pandas as pd
original_File = 'File1.csv'
target_File = "File2.csv"
shutil.copyfile(original_File, target_File)
FinalDF = pd.DataFrame()
FinalDF["Item"]= None
FinalDF["Value"] = None
FinalDF.loc["0"] = ["Book",300]
FinalDF.loc["1"] = ["Calculator",1000]
print(FinalDF)
FinalDF.to_csv('File2.csv',mode='a', index=False, header=False)
The issue was with the template file. On opening the template csv file in mac book, it was keeping the cursor active in the last column heading cell even after closing it.
When the dataframe is appended to the file2 which is a replica of File1, first row is continued from last column heading cell but succeeding rows are appended normally.
Hence i tried to open the same csv file in windows system, for the safer side, deleted all extra columns and rows in the template file and saved.
Thereafter not facing this issue.

How to maintain mutiindex view in pandas while converting pandas dataframe to csv file?

I have pandas dataframe where mutiindex is there(pic 1) but when i am converting it csv it not showing as multiindex(pic 2)
I am using to_csv() .Is there any parameter i need to pass to get right format?
pic 1:
pic:2
Tried as per suggestion, below is the pic
If you're not bothered about getting a CSV as output the way I do this is by putting the data in an XLSX file.
# Create the workbook to save the data within
workbook = pd.ExcelWriter(xlsx_filename, engine='xlsxwriter')
# Create sheets in excel for data
df.to_excel(workbook, sheet_name='Sheet1')
# save the changes
workbook.save()
Can you try this and see if it formats how you want?
Maybe this can be helpful for you:
pd.DataFrame(df.columns.tolist()).T.to_csv("dataframe.csv", mode="w", header=False, index=False)
df.to_csv("dataframe.csv", mode="a", header=False, index=False)
I guess you are using an older version of pandas. If you are using <0.21.0, there is a tupleize_cols parameter you can play with. If above, just save as to_csv. It will default to each index in one row -

Pandas dataframe to extract data from csv file seperated by markers and save into excel sheets

I have a csv file which contains a time-based data, and I use a column "Marker" to help identify the idle time (value: 0) and a set of useful data collection steps(value: 1 and 2 represents step-1 and step-2), and the time-based data repeats the idle and data collection steps multiple times. **I would like to loop the csv file by checking the value of "Marker" column and separate the useful data steps and save each set of the data into separate excel sheet.
I have the following code in mind:
n=len(df)
i=0
newdf=[]
for i in range(0,n):
if df.MARKER[i]==1:
newdf.append(df.iloc[:,i])
if df.MARKER[i]==0:
end
return newdf
I have no thought about the remaining part of the codes yet since this is not able to proceed.
It looks like you're just trying to filter to elements where MARKER == 1. If that's the case, you could just do newdf = df[df['MARKER'] == 1]

iterating through a dataframe column to see if individual cell value is in a list of file paths

I have a list of shapefile paths (sf_list) and I'm using a nested for loop to iterate through the list and a dataframe (df) to see if a value in the row of a column (name) of that dataframe is in that path, and if it is, append that list value to that row of the dataframe in a new column of the dataframe (sf_path). This is what I have now:
for sf in sf_list:
for row in df.iterrows():
df.loc[df['name'].isin(sf),'sf_path'] = [sf]
The script runs but the new column is empty. The list is populated with all of the paths I need and the column of that dataframe contains specific text that is in the path I went the path to populate in that row of the new column. Any direction appreciated.
UPDATE:
Alright now I have:
for sf in sf_list:
for row in dlrules_df.iterrows():
dlrules_df.loc[dlrules_df['dl_foldername'] in sf, 'sf_path'] = sf
Error returned:
TypeError: 'in <string>' requires string as left operand, not Series
Can you give this a try? apply isn't recommended but it has become quite a habit for me. Would like to spend more time to give you a more efficient solution but it's already bedtime here and this popped out off the back of my head.
sf = [list_of_folder_paths]
dlrules_df.loc[:, 'dl_foldername'].apply(lambda x: sf[sf.index(x)] if x in sf else None)
PS: Not tested, so it may break somewhere but I hope it gives you some idea.

How to append to an existing sheet and empty a dataFrame for new data

This might be a noob question but how can I append to a sheet instead of overwriting the existing data. Then how can I empty a dataFrame so it can be populated wit new data.
Basically I am reading in a file and populating a dataFrame writing to a sheet then emptying the dataFrame so it may be empty to read in new data.
I am stuck at empting the dataFrame:
avgs = avgs.drop(['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes'],axis=1)
And appending to the sheet.
avgs.to_excel("pandas_out.xlsx",merge_cells=False) ## need to append to file
You can consider using avgs = pd.DataFrame() to empty existing dataframe. In case of you want to preserve column names, you can try avgs = pd.DataFrame(columns=avgs.columns) instead.
Regarding dataframe append, there are many ways to do it. But you have to proceed it along following step. First pd.read_excel(), then append something, finally df.to_excel() again. About ways to append, Please refer to pd.concat(), pd.Series.append(), pd.DataFrame.append()