Pandas dataframe getting average of two categories and placing in to existing column - pandas

I have a dateframe df which is indexed by date (many same dates). I also have a column named name which has company names for each date, rating (A to Z) and category (health, utilities) etc and finally a column called price.
Price consists of many blank values with some populated values I want to fill the blanks with the average price of the other prices which are in the column price for the companies with the same rating and same category of the one which needs to be filled.

Try this
dt['adjPrice'] = dt.groupby(['rating', 'category']).price.apply(lambda s: s.fillna(s.mean()))

Related

Add a new column in a dataframe from another dataframe using category column and then finding max for each category

I have a dataframe (DF2, subset of DF1) with category column and count in each category. This was created from an original dataframe (DF1) which has date, and 7 float columns. Using one of the columns, this category column was created (continous to categorical data). Now for each of these categories, I need to find the max value from each of the remaining 6 columns and add new columns in DF2.

Pandas: Extracting data from sorted dataframe

Consider I have a dataframe with 2 columns: the first column is 'Name' in the form of a string and the second is 'score' in type int. There are many duplicate Names and they are sorted such that the all 'Name1's will be in consecutive rows, followed by 'Name2', and so on. Each row may contain a different score.The number of duplicate names may also be different for each unique string.'
I wish to extract data afrom this dataframe and put it in a new dataframe such that There are no duplicate names in the name column, and each name's corresponding score is the average of his scores in the original dataframe.
I've provided a picture for a better visualization:
Firstly make use of groupby() method as mentioned by #QuangHong:
result=df.groupby('Name', as_index=False)['Score'].mean()
Finally make use of rename() method:
result=result.rename(columns={'Score':'Avg Score'})

Is there a way in pandas to compare two values within one column and sum how many times the second value is greater?

In my pandas dataframe, I have one column, score, thats rows are values such as [80,100], [90,100], etc. what I want to do is go through this column and if the second value in the list is greater than the first value, then to count that. so that I have a value that sums the number of times where in [a,b], b was greater. how would I do this?
print(len([x for x in df['score'] if x[1] > x[0]]))

How to group by and sum several columns?

I have a big dataframe with several columns which contains strings, numbers, etc. I am trying to group by SCENARIO and then sum only the columns between 2020 and 2050. The only thing I have got so far is sum one column as displayed as follows, but I need to change this '2050' by the columns between 2020 and 2050, for instance.
df1 = df.groupby(["SCENARIO"])['2050'].sum().sum(axis=0)
You are creating a subset of the df with only that single column. I can't tell how your dataset looks like from the information provided, but try:
df.groupby(["SCENARIO"]).sum()
This should some up all the rows which are in the column.
Alternatively select the columns which you want to perform the summation on.
df.groupby(["SCENARIO"])[["column1","column2"]].sum()

Plot values in specific column range for a particular row in a pandas data frame

I have a 10 rows x 26 columns data set of a country region's oil production between 1990-2011. The first column designates the country region (e.g. Canada), the next 22 columns correspond to oil production between 1990 and 2010, and the last two columns have ratios of oil production in one year relative to another.
My goal is to simply plot the oil production as a function of time separately for each country (i.e. categorize by column 1 and discard the last two columns when plotting). What is the most efficient way to do this?
It seems like you want all of the columns in your data except the last two, so use df.iloc[:, :-2] to select it. You then want to transpose this data so that the dates are now the row and the countries are the columns (use .T). Finally, plot your data.
df.iloc[:, :-2].T.plot()