Pandas Removing index column

Pandas Removing index column - pandas

When I import a excel as a data frame in pandas and try to get rid of the first column I'm unable to do it even though i give index=None. What am I missing?

I ran into this, too. You can make the friend column the index instead. That gets rid of the original index that comes with pd.read_excel(). As #ALollz says, a dataframe always has an index. You get to choose what's in it.
data.set_index('friend', inplace=True)
See examples in the documentation on DataFrame.set_index().

Related

pandas - running into problems setting multiple columns using results from pd.apply()

I have a function that returns tuples. When I apply this to my pandas dataframe using pd.apply() function, the results look this way.
The Date here is an index and I am not interested in it.
I want to create two new columns in a dataframe and set their values to the values you see in these tuples.
How do I do this?
I tried the following:
This errors out citing mismatch between expected and available values. It is seeing these tuples as a single entity, so those two columns I specified on the left hand side are a problem. Its expecting only one.
And what I need is to break it down into two parts that can be used to set two different columns.
Whats the correct way to achieve this?

Make your function return a pd.Series, this will be expanded into a frame.
orders.apply(lambda x: pd.Series(myFunc(x)), axis=1)

use zip
orders['a'], orders['b'] = zip(*df['your_column'])

Pandas rename columns does not rename the column

I tried to do this:
districtident.rename(columns={'leaid':'nces'},inplace=True)
and it failed:
Other things that didn't work:
districtident = districtident.rename(columns={'leaid':'nces'})
Renaming the column names of pandas dataframe is not working as expected - python
renaming columns in pandas doesn't do anything
Wasn't great either.
Here's an obvious appeal:
Alas, no.
Restarting the kernel didnt' work either. The only thing that worked was:
districtident['nces'] = districtident['leaid']
disrictident.drop(['leaid'],axis=1,inplace=True)
But that's really not the best, I feel. Especially if I need to do a number of columns.

Error by conversion from Pyspark to Pandas

I am trying to convert a Pyspark dataframe to Pandas, so I simply write df1=df.toPandas(), and I get the error "ValueError: ordinal must be >= 1". Unfortunately, I don't see any other usefull information in the error message (it's quite long so i cannot post it here).
If somebody has an idea, what could be wrong, it would be nice.
I only saw this error in the case when a Pyspark dataframe had multiple columns with the same name, but this is not the case this time.
Thanks in advance.
Edit: I have experimented and found out, that the problem appears only if I select some specific columns. But I don't see what can be wrong with these columns.

Dataframe Key Error Column not in index

I am reading an excel file into pandas using pd.ExcelFile.
It reads correctly and I can print the dataframe. But when I try to select a subset of columns like:
subdf= origdf[['CUTOMER_ID','ASSET_BAL']]
I get error:
KeyError: "['CUTOMER_ID' 'ASSET_BAL'] not in index"
Do I need to define some kind of index here? When I printed the df, I verified that the columns are there.

Ensure that the columns actually exist in the dataframe. For example, you have written CUTOMER and not CUSTOMER, which I assume is the correct name.
You can verify the column names by using list(origdf.columns.values).

And for when you don't have a typo problem, here is a solution:
Use loc instead,
subdf= origdf.loc[:, ['CUSTOMER_ID','ASSET_BAL']].values
(I'd be glad to learn why this one works, though.)

The mechanism of auto inserting in Pandas Dataframe when selecting rows by index

I noticed a mechanism of auto inserting when selecting rows by index. To illustrate, I use the following code:
Then my questions are 2 (may be they are the same):
Any document about this mechanism? (I have tried but cannot find it in the long long official documents)
How to avoid the auto inserting? For example, I want the last line of code returns the only 'a' row.
Thank you very much in advance!

I have not seen any documentation. It looks like an unintended artifact. I can think of some clever things to do with it but I wouldn't trust it.
Work around
df1.loc[pd.Index([1, 'a']).intersection(df1.index), :]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas Removing index column - pandas

When I import a excel as a data frame in pandas and try to get rid of the first column I'm unable to do it even though i give index=None. What am I missing?

Related

pandas - running into problems setting multiple columns using results from pd.apply()

Pandas rename columns does not rename the column

Error by conversion from Pyspark to Pandas

Dataframe Key Error Column not in index

The mechanism of auto inserting in Pandas Dataframe when selecting rows by index

Categories

Resources