Cities and Sales data don’t align - dataframe

Cities grouped using .groupby, but doesn’t correspond with the correct sales figures for each city.
After plotting be chart, there’s no alignment between the city and sales. Results is random

Related

how to sort values in a horizontal bar graph that already has a variable

How do I sort the values based on top10[ActiveCases]? I don't seem to get the syntax right.
top10=df[0:21]
top10
plt.barh(top10['Country,Other'],width=top10['ActiveCases'])
plt.title("Top 10 countries with highest active cases")
before creating the graph use
top10 = top20
top10.sort_values(by='ActiveCases')
top10.head(10).plot(kind='barh')
this will plot the 10 highest countries

a bar chart based on the total numbers for each year in Pandas

I have two columns in which there are different numbers in different rows for each year.
First, I need to display the sorted values based on the total numbers for each year.
Second, I need to create a bar chart in which, the y-axis in 'year' and each bar has a label which is the total number for that year.
I'm not sure if I explained the problem clearly, but I would appreciate some help.
Let us do
df.set_index('Year')['Goals scored'].sum(level=0).sort_index().plot(kind='bar')
Try:
sum_by_years = (df.groupby('Year')['Goals scored'].sum()
.sort_values(ascending=False)
)
sum_by_years.plot.barh()

How can I plot the average of a column based off of the values in another column?

I have a list of campaign donations and I want to create a plot with the average 'contribution_receipt_amount' for each candidate.
indv.columns
#produces this code
#Index(['candidate_name', 'committee_name', 'contribution_receipt_amount',
'contribution_receipt_date', 'contributor_first_name',
'contributor_middle_name', 'contributor_last_name',
'contributor_suffix', 'contributor_street_1', 'contributor_street_2',
'contributor_city', 'contributor_state', 'contributor_zip',
'contributor_employer', 'contributor_occupation',
'contributor_aggregate_ytd', 'report_year', 'report_type',
'contributor_name', 'recipient_committee_type',
'recipient_committee_org_type', 'election_type',
'fec_election_type_desc', 'fec_election_year', 'filing_form', 'sub_id',
'pdf_url', 'line_number_label'],
dtype='object')
First aggregate mean to Series and then use Series.plot:
indv.groupby('candidate_name')['contribution_receipt_amount'].mean().plot()

Parse data from Morningstar Direct to worksheet

I have to put together a report every quarter using data pulled off of Morningstar Direct. I have to automate the whole process, or at least parts of it. We have put this report together for the last two quarters, and we use the same format each time. So, we already have the general templates for the report - now I'm just looking for a way to pull the data from Morningstar and putting into the templates correctly.
Does anyone have any general idea where I should start?
A B C D E F
Group Name Weight Gross Net Contribution
Equity 25% 10% 8% .25
IBM 5% 15% 12%
AAPL 7% 23% 18%
Fixed Income 25% 5% 4% .17
10 Yr Bond 10% 7% 5%
Emerging Mrkts
And it goes on breaking things into more groups, and there are many more holdings within each group.
What I want it to do is search until it finds "Equity", for example, and then go over one row, grab the name of the position, its weight, and its net return, and do that for each holding in Equity. The for it to do the same thing in Fixed Income, and on and on - selecting the names, weights, and nets for each holding. Then copy and pasting them into another workbook.
Anyway that is possible?
It sounds like you need to parse your information. By using left(), right(), and mid() you can select the good data and ignore the superfluous. You could separate the data in one cell into multiple cells in the desired format.
A B
Name Address
John Q. Public 123 My Street, City, State, Zip
E (First Name) F (Middle Initial) (extra work to program missing data)
=LEFT(A2,FIND(" ",A2)) =MID(A2,LEN(E2)+1,FIND(" ",MID(A2,LEN(E2)-1,99)))
G (Last Name) H (City)
=MID(A2,(LEN(E2)+LEN(F2)+2),99) =MID(B2,LEN(H2)+2,FIND(",",MID(B2,LEN(H2)+2,99))-1)
I (State)
=MID(B2,(LEN(I2)+LEN(H2)+4),FIND(",",MID(B2,(LEN(I2)+LEN(H2)+4),99))-1)
J (Zip Code)
=MID(B2,(LEN(H2)+LEN(I2)+LEN(J2)+6),99)
This code will parse the name in the cell A2 and address in cell B2 into separate fields.
Similar cuts should allow you to get rid of the unwanted data.
==================================================================
7/8/2015
Your data seems to be your desired output. If so, please provide sanitized input data for comparison. You probably need to loop through your input to find the groups. When the group changes, prepare the summary figures.

vba loop through all the pivot fields of a pivot table and return specified values

I have a dataset whose entries has 5 different attributes and one value. For example, I have a height of 5000 people. For each person I have his hair color, eye color, his nationality, the city he were born and the name of his mother (the 5 dimensions).
No/Eye Color/Hair Color/Nationality/Hometown/Mother's Name/Height
Blue Blond Swiss Zürich Nicole 184
Blue Brown English York Ruby 164
Brown Brown French Paris Sophie 154
etc..
So there are 5 dimensions. The data is set dynamically, so the number of categories in each dimensions can vary. I sought to compute the average height of people depending on whether I want to include some dimensions or not (from 1 to 5). For example I wanted the retrieve:
The average height of French and Blue eyed people. Next day only the people born in London. And the week after, the Swiss, blue-eyed, red-haired, born in Geneva and whose mother is called Nicole.
So I create a pivot table with the Eye Color as Row labels, Hair Color as Column labels, the average height as the Data and the last 3 dimensions as Market Filters. This allowed me see all the possible and desired combinations of average height that my data implies.
Now my goal is:
I want to create a Macro that goes through all the possible combinations that my dimensions entails (i.e 2^5-1=31) and store in a vector all the combination of height average that are above a certain value, e.g. 190. And then It could print on a worksheet.
I was thinking on using some booleans arrays vector and For-Each-Next structure, but I must say that I fail to picture how to implement it.
Any ideas?
Thanks for the time and help!