the column in csv that comes from the index of DataFrame does not have a header name - pandas

here is a pandas DataFrame
>>> print(df)
A B C
0 0 1 2
1 3 4 5
2 6 7 8
with df.to_csv('df.csv') I got this file
the column in csv that comes from the index of DataFrame does not have a header name. Is it possible to specify a column name with pandas?

Try with rename_axis
df.rename_axis('index').to_csv('df.csv')

Related

Subtract a specific row from a csv using phyton

I have two csv files: one containing data, the other one containing a single row with the same columns as the first file. I am trying to subtract the one row from the second file from all the rows from the first file using pandas.
I have tried the following, but to no avail.
df = df.subtract(row, axis=1)
You're looking for the "drop" method. From pandas docs:
df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
drop by index:
df.drop([0, 1])
A B C D
2 8 9 10 11
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html

python - List of Lists into pandas dataframe including name of columns

I would like to transfer a list of lists into a dataframe with columns based on the lists in the list.
This is still easy.
list = [[....],[....],[...]]
df = pd.DataFrame(list)
df = df.transpose()
The problem is: I would like to give the columns a column-name based on entries I have in another list:
list_two = [A,B,C,...]
This is my issue Im still struggling with.
Is there any approach to solve this problem?
Thanks a lot in advance for your help.
Best regards
Sascha
Use zip with dict for dictionary of lists and pass to DataFrame:
L= [[1,2,3,5],[4,8,9,8],[1,2,5,3]]
list_two = list('ABC')
df = pd.DataFrame(dict(zip(list_two, L)))
print (df)
A B C
0 1 4 1
1 2 8 2
2 3 9 5
3 5 8 3
Or if pass index parameter after transpose get columns names by this list:
df = pd.DataFrame(L, index=list_two).T
print (df)
A B C
0 1 4 1
1 2 8 2
2 3 9 5
3 5 8 3

pandas read csv is returning extra unknown column

I am creating a csv file from pandas dataframe by combining two lists using:
df= pd.DataFrame(list(zip(patients_full, labels)),
columns=['id','cancer'])
df.to_csv("labels.csv")
but when I read the csv back there is an unknown column unnamed that shows up ? how do I remove that ?
Unnamed: 0 id cancer
0 0 HF0953.npy 1
1 1 HF1058.npy 3
2 2 HF1071.npy 3
3 3 HF1122.npy 3
4 4 HF1235.npy 1
5 5 HF1280.npy 2
6 6 HF1344.npy 1
7 7 HF1463.npy 1
8 8 HF1489.npy 1
9 9 HF1490.npy 2
10 10 HF1587.npy 2
11 11 HF1613.npy 2
This is happening because of the index column that is saved by default when you do to_csv("labels.csv"). As the index column in the data frame that you were saving didn't have a name, when you read your read_csv("labels.csv") it is treated as all other columns but with 'Blank' column named that is becoming Unnamed: 0. To avoid this you have 2 options:
Option 1 - not read the index:
read_csv("labels.csv", index_col=False)
Option 2 - not save the index:
to_csv("labels.csv", index=False)
What that column is in your output is the index of the dataframe. To not include it in the output: df.to_csv('labels.csv', index=False). More information is available on that method here in the pandas docs

How to convert columns values of a csv file to different format structure in pandas?

I have 100 csv files in a folder. all of them has a column name which is the name of the file and column z value.
import pandas as pd
df = pd.read_csv("ProfileGraph1.csv")
df.head()
Name Z
0 1 -3.687422
1 1 -3.688351
2 1 -3.684376
3 1 -3.691209
4 1 -3.693000
df = pd.read_csv("ProfileGraph2.csv")
df.head()
Name Z
0 2 -3.691955
1 2 -3.694228
2 2 -3.692559
3 2 -3.699092
4 2 -3.698381
df = pd.read_csv("ProfileGraph3.csv")
df.head()
Name Z
0 3 -3.693265
1 3 -3.694765
2 3 -3.693598
3 3 -3.697865
4 3 -3.699872
I would like to go through each of them and convert Z column of each csv file to a row and store it in a new csv file, and append all of them to a new csv file. this is the output that I made it manually:
df = pd.read_csv("filename.csv")
df.head()
Name 1 2 3 4 5
0 1 -3.687422 -3.688351 -3.684376 -3.691209 -3.693000
1 2 -3.691955 -3.694228 -3.692559 -3.699092 -3.698381
2 3 -3.693265 -3.694765 -3.693598 -3.697865 -3.699872
First loop by list of all files and create big DataFrame by concat, then reshape by cumcount for counter with unstack:
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]
df = pd.concat(dfs, ignore_index=True)
df = df.set_index(['Name',df.groupby('Name').cumcount()])['Z'].unstack().reset_index()

How to check dependency of one column to another in a pandas dataframe

I have the following dataframe:
import pandas as pd
df=pd.DataFrame([[1,11,'a'],[1,12,'a'],[1,11,'a'],[1,12,'a'],[1,7,'a'],
[1,12,'a']])
df.columns=['id','code','name']
df
id code name
0 1 11 a
1 1 12 a
2 1 11 a
3 1 12 a
4 1 7 a
5 1 12 a
As shown in the above dataframe, the value of column "id" is directly related to the value of column "name". If I have say, a million records, how can I know that a column is totally dependent on other column in a dataframe?
If they are totally dependent, then their factorizations will be the same
(df.id.factorize()[0] == df.name.factorize()[0]).all()
True