Most efficient way to find the average of multiple dataframe rows with similar but changing values - pandas

I have a dataframe which contains data regarding CO2 levels over time and has two key columns: Year and ppm. Year goes from 1974 to 2019, and there are several rows for each year. So for example 1974 starts with a ppm of 333.34, and very next row is 1974 with a ppm of a slightly different ppm. There's 2000+ rows total. I want to get the average ppm for each year and plot for each single year.
I'm trying to figure out the best way to do this. Right now some things I've considered:
df_Year = df.loc[df['Year']==1975]
which would isolate all of the 1975 rows, then use
df_Year['ppm'].astype("float").mean(axis=0)
and I could then get the average that way, but that's just one year. I am thinking I could make a loop which iterates through each year and gets the average and then assigns the average ppm to a list or dictionary or something.
But it just seems kind of lengthy. Isn't there a more efficient way?

Related

Power pivot ytd calculation

Ok, I have watched many videos and read all sorts and I think I am nearly there, but must be missing something. In the data model I am trying to add the ytd calc to my product_table. I don't have unique dates in the product_table in column a and also they are weekly dates. I have all data for 2018 for each week of this year in set rows of 20, incrementing by one week every 20 rows. E.g. rows 1-20 are 01/01/2018, rows 21-40 are 07/01/2018, and so on.
Whilst I say they are in set rows of 20, this is an example. Some weeks there are more or less than 20 so I can't use the row count function-
Between columns c and h I have a bunch of other categories such as customer age, country etc. so there isn't a unique identifier. Do I need one for this to work? Column i is the sales column with the numbers. What I would like is a new column which gives me a ytd number for each row of data which all has unique criteria between a and h. Week 1 ytd is not going to be any different. For the next 20 rows I want it to add week1 sales to week2 sales, effectively giving me the ytd.
I could sumproduct this easily in the data set but I don't want do that. I want to use dax to save space etc..
I have a date_table which does have unique dates in the main_date column. All my date columns are formatted as date in the data model.
I have tried:
=calculate(products[sales],datesytd(date_table[main_date]))
This simply replicates the numbers in the sales column, not giving me an ytd as required. I also tried
=calculate(sum(products[sales]) ,datesytd(date_table[main_date]))
I don't know if what I am trying to do is possible. All the youtube clips don't seem to have the same issues I am having but I think they have unique dates in their data sets.
Id love to upload the data but its work stuff on a work computer so cant really. Hope I've painted the picture quite clearly.
Resolved, after googling sumif dax, mike honey had a response that i have adapted to get what i need. I needed to add the filter and earlier functions to my equarion and it ended up like this
Calculate (sum(products[sales]),
filter (sales, sales[we_date] <=earlier(sales[we_date]),
filter (sales, sales[year] =earlier(sales[year]),
filter (sales, sales[customer] =earlier(sales[customer]))
There are three other filter sections i had to add, but this now gives me the ytd i needed.
Hope this helps anyone else

Calculate average VBA from dynamic table

I ve a table on Excel with the week number and the weight (in Kg.)
Somebody can insert his weight every day during a week or just once or not at all. I can't manage that.
Then my table can literally change. from zero to 7 even more lines a week (like the yellow side of the image).
What I wanna do is to calculate the weight average per week. and then I will have one line for each week, when i got at least one weight (sometimes I won't have any line). We can have week without any weight so then I don't want this line at all. We can also easily have a weight for the week 2 but between the weeks 5 and 6 in the yellow table. That would happen if someone insert his weight after others.
How can I say this two weeks are similar, so we calculate the average for this two weight ?
I hope it's enough clear with this picture
Use formula below in Column C to calculate average(assume Week in column A and Weight in column B)
=AVERAGEIF(A:A,A2,B:B)
Average Column Copy->PasteSpecial value only,
then Remove Duplicates base on Week and the new Average Column

SQL GROUPING SETS averages with multiple many-to-many dimensions

I have a table of data with the following:
User,Platform,Dt,Activity_Flag,Total_Purchases
1,iOS,05/05/2016,1,1
1,Android,05/05/2016,1,2
2,iOS,05/05/2016,1,0
2,Android,05/05/2016,1,2
3,iOS,05/05/2016,1,1
3,Android,06/05/2016,1,3
1,iOS,06/05/2016,1,2
4,Android,06/05/2016,1,2
1,Android,06/05/2016,1,0
3,iOS,07/05/2016,1,2
2,iOS,08/05/2016,1,0
I want to do a GROUPING SETS (Platform,Dt,(Platform,Dt),()) aggregation to be able to find for each combination of Platform and Dt the following:
Total Purchases
Total Unique Users
Average Purchases per User per Day
The first two are simple as these can be achieved via a sum(Total_Purchases) and count(distinct user) respectively.
The problem I have is with the last metric. The result set should look like this but I don't know how to get the last column to be calculated correctly:
Platform,Dt,Total_Purchases,Total_Unique_Users,Average_Purchases_Per_User_Per_Day
Android,05/05/2016,4,2,2.0
iOS,05/05/2016,2,3,0.7
Android,06/05/2016,5,3,1.7
iOS,06/05/2016,2,1,2.0
iOS,07/05/2016,2,1,2.0
iOS,08/05/2016,0,1,0.0
,05/05/2016,6,3,2.0
,06/05/2016,7,3,2.3
,07/05/2016,1,1,1.0
,08/05/2016,1,1,1.0
Android,,9,4,1.8
iOS,,6,3,1.2
,,15,4,1.6
For the first ten rows we see that getting the Average purchase per user per day is a simple division of the first two columns as the dimension in these rows represent a single date only. But when we look at the final 3 rows we see that the division is not the way to achieve the desired result. This is because it needs to take an average for each day in turn to get the overall per day amount.
If this isn't clear please let me know and I'll be happy to explain better. This is my first post on this site!

How do I compute an average of calculated averages in MS reportviewer/rdlc?

I've searched here and elsewhere on the web and have not found this exact problem/solution.
I'm building an rdlc report using the MS reportViewer - the report I'm creating is based on an existing spreadsheet where the average price across 6 months is calculated individually for each month, then the average of those prices is calculated as the 6 month period average price. Whether I agree with that methodology or if it's correct is irrelevant, I just need to know how to get an rdlc to do this.
For example:
Month Price1 Price2 Delta
May-12 $31.54 $30.03 $1.51
Jun-12 $36.27 $34.60 $1.67
Jul-12 $44.19 $42.00 $2.19
Aug-12 $38.96 $37.06 $1.90
Sep-12 $36.89 $35.08 $1.81
Oct-12 $35.57 $33.97 $1.60
Average $37.24 $35.46 $1.78
(sorry for the lack of a screen snip, I'm new and the system won't let me post an image...)
I've created a tablix that does the monthly averages computation - I use a group in the table to group the 6 months of data by month (and then hide the hourly price data so you only see the month total row) but I'm stuck on how to calculate the bottom row of the table which is the average of each column. (the average of the averages is not the same as the average of all 6 months of prices from the underlying data - that's what I've learned in this process... IOW, that was my first solution :-) )
What I tried to do to get the average of the averages was give the month total cell a name, MonthlyAvgPrice1, then in the bottom row, used this expression:
Avg(reportitems!MonthlyAvgPrice1.Value)
As I kind of expected, this didn't work, when I try to run the report, it gets a build error saying "The Value expression for the textrun 'Price1PeriodAvg.Paragraphs[0].TextRuns[0]' uses an aggregate function on a report item. Aggregate functions can be used only on report items contained in page headers and footers."
Hopfully I've explained this well, does anyone know how to do this?
Thanks!
-JayG
Actually it is not clear from the question that how are you in particular binding the data to the report items, But from the given information what I understand is that you can
Try like this:
Right Click the tablix row and insert a row below
In the cell where you want to have this Average of Averages insert the following expression
=Sum(Fields!Price1.Value)/6
and similarly insert expression =Sum(Fields!Price2.Value)/6 and =Sum(Fields!Delta.Value)/6 in the other cells where you want to display the Averages
Of Course, you will change the Field names Price1,Price2 etc to the fields that you are getting the values from.
HTH

Sql Queries for finding the sales trend

Suppose ,I have a table which has all the billing records. Now I want to see the sales trend for a user given time duration group by each 3 days ...what should be the sql query regarding this?
please help,Otherwise I am gone ...
I can only give a vague suggestion as per the question, however you may want to have a derived column with a standardised date (as per MS date format, just a number per day) that you could then use a modulus (3) on so that days are equal per 3 day period. You can then group and aggregate over this column to get the values for a 3 day period. Obviously to display the date nicely you would have to multiply back and convert your column as well.
Again I'm not sure of the specifics, but I think this general idea could be achieved to get a result (may well not be the best way so it would help to add more to the question...)