I have a DF that looks like this:
value
objectID
ab798 54.68
ab799 45.98
ab800 38.79
etc.. etc..
where "value" is accesible as a column but "objectID" isn't, it's as if the DF has been indexed by "objectID". I want to have objectID be a column header like value and be able to access all of its rows (ab798, ab799, etc...) by calling pd.objectID.
You can reset the index
df.reset_index()
Or, you can access it as an index:
df.index.values
Related
My code to save the df is:
fdi_out_vdem.to_csv("fdi_out_vdem.csv")
To read the df into python is :
fdi_out_vdem = pd.read_csv("C:/Users/asus/Desktop/classen/fdi_out_vdem.csv")
The df:
Unnamed: 0
country_name
value
1
Spain
190
2
Spain
311
Your df has two columns, but also an index with "0" and "1". When writing it to csv it looks like this:
,country_name,value
0,Spain,190
1,Spain,311
When importing it with pandas you it is considered as df with 3 columns (and the first has no name)
You have two possibilities here:
Save it without index column:
df.to_csv("fdi_out_vdem.csv", index=False)
df = pd.read_csv("C:/Users/asus/Desktop/classen/fdi_out_vdem.csv")
or save it with index column and define an index col when reading it with pd.read_csv
df.to_csv("fdi_out_vdem.csv")
df = pd.read_csv("C:/Users/asus/Desktop/classen/fdi_out_vdem.csv", index_col=[0])
UPDATE
As recommended by #ouroboros1 in the comments you could also name your index before saving it to csv, so you can define the index column by using that name
df.index.name = "index"
df.to_csv("fdi_out_vdem.csv")
df = pd.read_csv("C:/Users/asus/Desktop/classen/fdi_out_vdem.csv", index_col="index")
You can either pass the parameter index_col=[0] to pandas.read_csv :
fdi_out_vdem = pd.read_csv("C:/Users/asus/Desktop/classen/fdi_out_vdem.csv", index_col=[0])
Or even better, get rid of the index at the beginning when calling pandas.DataFrame.to_csv:
fdi_out_vdem.to_csv("fdi_out_vdem.csv", index=False)
I am trying to reposition the index column in the output CSV from pandas DataFrame.to_csv()
I can order the non index columns using columns but it is unclear how to move the index column.
If i have 2 columns Name and Age and index i want the columns to come out in the following order in resulting CSV Name, Age,index
Anyone know how to do this?
index cannot be moved, it is always first column in DataFrame or Series or Panel. But you can copy data from index to another column.
But if need last column created from index:
df['new_last'] = df.index
If need custom position of new column:
df.insert(2, 'new', df.index)
And last for prevent write index to csv, thanks #Vivek Kalyanarangan:
df.to_csv(file, index=False)
I have a pandas dataframe with 10 columns. I would like to add a column which will uniquely identify every row. I do have to come up with the unique value(could be as simple as a running sequence). How can I do this? I tried adding index as a column itself but for some reason I get a KeyError when I do this.
add a column from range of len of you index
df['new'] = range(1, len(df.index)+1)
I have a Python dictionary and I created a panda data frame like below:
I want to change the name of index column to date . But I couldn't do this with data.set_index('date') . How can I do this? Any advice would be appreciated.
data.set_index('date') here only assign the column with name 'date' as index for the dataframe data.
You can rewrite the name of the column using data.index.name = 'data'
I have a data frame and use some of its columns to group by:
grouped = df.groupby(['col1', 'col2'])
Now I use mean function to get a new data frame object from the above created groupby object:
df_new = grouped.mean()
Now I have two data frames (df and df2) and I would like to merge them using col1 and col2. The problem that I have now is that df2 does no have these columns. After groupby operation col1 and col2 are "shifted" to index. So, to resolve this problem, I try to create these columns:
df2['col1'] = df2['index'][0]
df2['col2'] = df2['index'][1]
But it does not work because 'index' is not recognized as a column of the data frame.
As an alternative Andy Hayden's method, you could use as_index=False to preserve the columns as columns rather than indices:
df2 = df.groupby(['col1', 'col2'], as_index=False).mean()
You can use left_index (or right_index) arguments of merge:
left_index : boolean, default False
Use the index from the left DataFrame as the join key(s).
If it is a MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels
and use right_on to determine which columns it should merge the index with.
So it'll be something like:
pd.merge(df, df_new, left_on=['col1', 'col2'], right_index=True)