Summing up columns in panda dataframe but its not presented correctly - pandas

data_int['$5,000-6,999'] = data_int.iloc[1:5,6:8].sum(axis=1)
I tried to sum the columns 5 and 7, '5000-5999' and '6000-6999' respectively. However, the numbers are not displaying correctly, it shows e+.... Please help! Thanks!

Related

Extract columns with range in Google Colab

I want to extract some columns maybe just 10 from a datafram with 30 columns but I'm not finding any code or functions to do it, I tried with iloc but not good results at all, help please here is my data frame:
So I just want to get the columns 1 to 10:
df1_10 = df.columns['1'....'10']
If you want to fetch 10 columns from your dataset then use this piece of code
df.iloc[:,1:11] # this will give you 10 columns
df.iloc[:,1:10] # this will give you only 9 columns.
# This is what you use in your code that's why you don't get the desired result.

Within a Pandas dataset, how do I keep only the rows that have a minimum of 4 values, deleting the rest

I am using a pandas dataset and need to delete rows that have a value of 3 or less. So if there are 12 columns and only 9 are populated with information that row needs to be deleted.
If this is confusing let me know and I will explain it another way.
Thanks good people
Edit
This is the code I have tried so far. It gives a syntax error.
indexnames = dataset.row[<= 3].index
dataset.drop(indexnames, inplace=True)
Try this:
New_df= Base_df.iloc[Base_df[Base_df.isnull().sum(axis=1)>3].index]
New_df

Creating a 2 level Multi Index

I'd like to create a multi index where: the first column is 'ID', the second column is Revenue and EBITDA, and the 5 different dates as columns.
Please see image of the dataframe attached. Thanks guys
https://i.stack.imgur.com/Rit5j.png
Got it eventually:
dataframe = dataframe.pivot_table(index=['ID'],values= ['Revenue','EBITDA'],columns='Date').stack(0,1)

How to group by and sum several columns?

I have a big dataframe with several columns which contains strings, numbers, etc. I am trying to group by SCENARIO and then sum only the columns between 2020 and 2050. The only thing I have got so far is sum one column as displayed as follows, but I need to change this '2050' by the columns between 2020 and 2050, for instance.
df1 = df.groupby(["SCENARIO"])['2050'].sum().sum(axis=0)
You are creating a subset of the df with only that single column. I can't tell how your dataset looks like from the information provided, but try:
df.groupby(["SCENARIO"]).sum()
This should some up all the rows which are in the column.
Alternatively select the columns which you want to perform the summation on.
df.groupby(["SCENARIO"])[["column1","column2"]].sum()

counting each value in dataframe

So I want to create a plot or graph. I have a time series data.
My dataframe looks like that:
df.head()
I need to count values in df['status'] (there are 4 different values) and df['group_name'] (2 different values) for each day.
So i want to have date index and count of how many times each value from df['status'] appear as well as df['group_name']. It should return Series.
I used spam.groupby('date')['column'].value_counts().unstack().fillna(0).astype(int) and it working as it should. Thank you all for help