find mean batting average ("Ave") and total number of centuries by country but only including records for which starting year is 2010 or later - pandas

I have a dataset of cricket players and want to find mean batting average "Ave" and total number of centuries “Hundreds” by country (column) but only including records for which starting year “From” is 2010 or later
Ave, Hundreds, Country, From are the columns name
new_data.groupby(['Country'])['Ave'].mean()
new_data.groupby(['Country'])['Hundreds']
I want to apply these two in a single line and also want to use the condition that starting Year should be 2010 or later

I am assuming you have two columns only Ave and Hundreds. You can do it by using Pandas .agg method.
grouped_data = new_data[new_data['From'].year >= 2010].groupby(['Country'])
grouped_data.agg(['mean', 'sum'])
Let me know if it doesn't work.

new_data[new_data['From'].year >= 2010].groupby(['Country'])['Ave'].mean()
You can do the same for 'Hundreds'.

Related

Most efficient way to find the average of multiple dataframe rows with similar but changing values

I have a dataframe which contains data regarding CO2 levels over time and has two key columns: Year and ppm. Year goes from 1974 to 2019, and there are several rows for each year. So for example 1974 starts with a ppm of 333.34, and very next row is 1974 with a ppm of a slightly different ppm. There's 2000+ rows total. I want to get the average ppm for each year and plot for each single year.
I'm trying to figure out the best way to do this. Right now some things I've considered:
df_Year = df.loc[df['Year']==1975]
which would isolate all of the 1975 rows, then use
df_Year['ppm'].astype("float").mean(axis=0)
and I could then get the average that way, but that's just one year. I am thinking I could make a loop which iterates through each year and gets the average and then assigns the average ppm to a list or dictionary or something.
But it just seems kind of lengthy. Isn't there a more efficient way?

Power pivot ytd calculation

Ok, I have watched many videos and read all sorts and I think I am nearly there, but must be missing something. In the data model I am trying to add the ytd calc to my product_table. I don't have unique dates in the product_table in column a and also they are weekly dates. I have all data for 2018 for each week of this year in set rows of 20, incrementing by one week every 20 rows. E.g. rows 1-20 are 01/01/2018, rows 21-40 are 07/01/2018, and so on.
Whilst I say they are in set rows of 20, this is an example. Some weeks there are more or less than 20 so I can't use the row count function-
Between columns c and h I have a bunch of other categories such as customer age, country etc. so there isn't a unique identifier. Do I need one for this to work? Column i is the sales column with the numbers. What I would like is a new column which gives me a ytd number for each row of data which all has unique criteria between a and h. Week 1 ytd is not going to be any different. For the next 20 rows I want it to add week1 sales to week2 sales, effectively giving me the ytd.
I could sumproduct this easily in the data set but I don't want do that. I want to use dax to save space etc..
I have a date_table which does have unique dates in the main_date column. All my date columns are formatted as date in the data model.
I have tried:
=calculate(products[sales],datesytd(date_table[main_date]))
This simply replicates the numbers in the sales column, not giving me an ytd as required. I also tried
=calculate(sum(products[sales]) ,datesytd(date_table[main_date]))
I don't know if what I am trying to do is possible. All the youtube clips don't seem to have the same issues I am having but I think they have unique dates in their data sets.
Id love to upload the data but its work stuff on a work computer so cant really. Hope I've painted the picture quite clearly.
Resolved, after googling sumif dax, mike honey had a response that i have adapted to get what i need. I needed to add the filter and earlier functions to my equarion and it ended up like this
Calculate (sum(products[sales]),
filter (sales, sales[we_date] <=earlier(sales[we_date]),
filter (sales, sales[year] =earlier(sales[year]),
filter (sales, sales[customer] =earlier(sales[customer]))
There are three other filter sections i had to add, but this now gives me the ytd i needed.
Hope this helps anyone else

Aggregating 15-minute data into weekly values

I'm currently working on a project in which I want to aggregate data (resolution = 15 minutes) to weekly values.
I have 4 weeks and the view should include a value for each week AND every station.
My dataset includes more than 50 station.
What I have is this:
select name, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name
order by name
But it only displays the avg value of all weeks. What I need is avg values for each week and each station.
Thanks for your help!
The problem is that when you do a 'GROUP BY' on just name you then flatten the weeks and you can only perform aggregate functions on them.
Your best option is to do a GROUP BY on both name and week so something like:
select name, week, avg(parameter1), avg(parameter2)
from data
where week in ('29','30','31','32')
group by name, week
order by name
PS - It' not entirely clear whether you're suggesting that you need one set of results for stations and one for weeks, or whether you need a set of results for every week at every station (which this answer provides the solution for). If you require the former then separate queries are the way to go.

MDX- Divide Each row by a value based on parent

I am in a situation where I need to calculate Percentage for every fiscal year depending on distinct count of the rows.
I have achieved the distinct count (fairly simple task) for each year city-wise and reached till these 2 listings in cube.
The first listing is state wide distinct count for given year.
Second listing is city wise distinct count for given year with percentage based on state-wide count for that year for that city.
My problem is that I need to prepare a calculated member for the percentage column for each given year.
For eg, In year 2009, City 1 has distinct count of 2697 and percentage raise of 32.94%. (Formula used= 2697/8187 ).
I tried with ([Measures].[Distinct Count])/(SUM(ROOT(),[Measures].[Distinct Count])) but no luck.
Any help is highly appreciated.
Thanks in advance.
PS: City wide sum of year 2009 can never be equal to statewide distinct count of that year. This is because we are calculating the distinct count for city and state both.
You need to create a Region Hierarchy for this, like State -> City. The create a calculation like below. Then in the browser put your Hierarchy on the left and the sales and calculated percentage in values.
([Dim].[Region].CurrentMember, [Measures].[Salesamt]) /
iif(
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt]) = 0,
([Dim].[Region].CurrentMember, [Measures].[Salesamt]),
([Dim].[Region].CurrentMember.Parent, [Measures].[Salesamt])
)

using MIN on a datepart with Group BY not working, returns different dates

Can anyone help with an aggregate function.. MIN.
I have a car table that i want to return minimum sale price and minimum year on a tbale that has identical cars but different years and price ...
Basically if i removed Registration (contains a YEAR) from the group by and select the query works but if i leave it in then i get 3 cars returned which are exactly the same model,make etc but with different years..
But i am using MIN so it should return 1 car with the year 2006 (the minimum year between the 3 cars)
The MIN(SalePrice) is working perfectly .. its the registraton thats not owrking..
Any ideas?
SELECT
MIN(datepart(year,[Registration])) AS YearRegistered,
MIN(SalePrice), Model, Make
FROM
[VehicleSales]
GROUP BY
datepart(year,[Registration]), Model, Make
IF I have correctly understood what you are looking for, you should query:
SELECT Model, Make, MIN(datepart(year,[Registration])) AS YearRegistered, MIN(SalePrice)
FROM [VehicleSales]
GROUP BY Model, Make
Hope it helps.
Turro answer will return the lowest registration year and the lowest price for (Model, Make), but this doesn't mean that lowest price will be for the car with lowest Year.
Is it what you need?
Or, you need one of those:
lowest price between the cars having lowest year
lowest year between the cars having lowest price
-- EDITED ---
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
That's why I made a comment. Imagine next situation
Porshe 911 2004 2000
Porshe 911 2004 3000
Porshe 911 2005 1000
Porshe 911 2005 5000
You'll get result that will not really tell you if this car goes cheaper based on year or not.
Porshe 911 2004 1000
I don't know how you'll tell if car gets cheaper next year based on one row without comparison with previous year, at least.
P.S. I'd like to buy one of cars above for listed price :D
You're getting what you're asking for: the cars are put into different groups whenever their model, make, or year is different, and the (minimum, i.e. only) year and minimum price for each of those groups is returned.
Why are you using GROUP BY?
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
You should find cheapest (or average) make/model per year and compare with the cheapest (or average) from previous year (for the same make/model).
Then you can see which of them gets cheaper the next year (I suppose most of them)