how to sort values in a horizontal bar graph that already has a variable - pandas

How do I sort the values based on top10[ActiveCases]? I don't seem to get the syntax right.
top10=df[0:21]
top10
plt.barh(top10['Country,Other'],width=top10['ActiveCases'])
plt.title("Top 10 countries with highest active cases")

before creating the graph use
top10 = top20
top10.sort_values(by='ActiveCases')
top10.head(10).plot(kind='barh')
this will plot the 10 highest countries

Related

Proportions for multiple subcategories

I am trying to calculate proportions with multiple subcategories. As seen in the screenshot below, the series is grouped by ['budget_levels', 'revenue_levels'].
I would like to calculate the proportion for each.
For example,
budget_levels=='low' & revenue_levels=='low' / budget_levels=='low'
budget_levels=='low' & revenue_levels=='medium' / budget_levels=='low'
However, not getting the desired output.
Is there any way I could do this calculation for each with a simple one-line code such as .apply(lambda) function?
Use value_counts to get the number of occurences of each combination. Then group by the column budget_levels and divide the observations in each group by their sum. sort_index makes it easier to compare the groups.
df.value_counts().groupby(level=0).transform(lambda x: x / x.sum()).sort_index()

how to sum rows in my dataframe Pandas with specific condition?

Could anyone help me ?
I want to sum the values with the format:
print (...+....+)
for example:
a b
France 2
Italie 15
Croatie 7
I want to make the sum of France and Croatie.
Thank you for your help !
One of possible solutions:
set column a as the index,
using loc select rows for the "wanted" values,
take column b,
sum the values found.
So the code can be:
result = df.set_index('a').loc[['France', 'Croatie']].b.sum()
Note double square brackets. The outer pair is the "container" of index values
passed to loc.
The inner part, and what is inside, is a list of values.
To subtract two sums (one for some set of countries and the second for another set),
you can run e.g.:
wrk = df.set_index('a').b
result = wrk.loc[['Italie', 'USA']].sum() - wrk.loc[['France', 'Croatie']].sum()

a bar chart based on the total numbers for each year in Pandas

I have two columns in which there are different numbers in different rows for each year.
First, I need to display the sorted values based on the total numbers for each year.
Second, I need to create a bar chart in which, the y-axis in 'year' and each bar has a label which is the total number for that year.
I'm not sure if I explained the problem clearly, but I would appreciate some help.
Let us do
df.set_index('Year')['Goals scored'].sum(level=0).sort_index().plot(kind='bar')
Try:
sum_by_years = (df.groupby('Year')['Goals scored'].sum()
.sort_values(ascending=False)
)
sum_by_years.plot.barh()

How to check the highest score among specific columns and compute the average in pandas?

Help with homework problem: "Let us define the "data science experience" of a given person as the person's largest score among Regression, Classification, and Clustering. Compute the average data science experience among all MSIS students."
Beginner to coding. I am trying to figure out how to check amongst columns and compare those columns to each other for the largest value. And then take the average of those found values.
I greatly appreciate your help in advance!
Picture of the sample data set: 1: https://i.stack.imgur.com/9OSjz.png
Provided Code:
import pandas as pd
df = pd.read_csv("cleaned_survey.csv", index_col=0)
df.drop(['ProgSkills','Languages','Expert'],axis=1,inplace=True)
Sample Data:
What I have tried so far:
df[data_science_experience]=df[["Regression","Classification","Clustering"]].values.max()
df['z']=df[['Regression','Classification','Clustering']].apply(np.max,axis=1)
df[data_science_experience]=df[["Regression","Classification","Clustering"]].apply(np.max,axis=1)
If you want to get the highest score of column 'hw1' you can get it with:
pd['hw1'].max(). this gives you a series of all the values in that column and max returns the maximum. for average use mean:
pd['hw1'].mean()
if you want to find the maximum of multiple columns, you can use:
maximum_list = list()
for col in pd.columns:
maximum_list.append(pd[col].max)
max = maximum_list.max()
avg = maximum_list.mean()
hope this helps.
First, you want to get only the rows with MSIS in the Program column. That can be done in the following way:
df[df['Program'] == 'MSIS']
Next, you want to get only the Regression, Classification and Clustering columns. The previous query filtered only rows; we can add to that, like this:
df.loc[df['Program'] == 'MSIS', ['Regression', 'Classification', 'Clustering']]
Now, for each row remaining, we want to take the maximum. That can be done by appending .max(axis=1) to the previous line (axis=1 because we want the maximum of each row, not each column).
At this point, we should have a DataFrame where each row represents the highest score of the three categories for each student. Now, all that's left to do is take the mean, which can be done with .mean(). The full code should therefore look like this:
df.loc[df['Program'] == 'MSIS', ['Regression', 'Classification', 'Clustering']].max(axis=1).mean()

icCube: multiple dimensions in MDX output

The documentation of icCube states:
However, a SELECT is not limited to two axes. We could have columns,
rows, pages, chapters, and sections. And you could still continue
beyond these by specifying a number for the axis.
Indeed, when I try using three dimensions on the demo Sales cube, it works:
select
{[paris], [london]} on 0,
{[2005], [2006]} on 1,
product.members on 2
from sales
However, when I try four dimensions:
select
{[paris], [london]} on 0,
{[2005], [2006]} on 1,
product.members on 2,
measures.members on 3
from sales
I get an error message: Unexpected number of axes (4) for the pivot table (expected:0..3)
What am I missing?
There is nothing wrong with using a 4 axes query. However, it is left up to the client your are using to be able to display it.
For example, Excel accepts 2D results, the icCube pivot table is able to display results up to (and including) 3 axes.
Hope that helps.