I'm currently a beginner in using RStudio and need guidance. I have a data set that includes the year, month, day, hour, minute, second and ParticleCount. I'm trying to figure out how to write the code to help me calculate the average of the Particle Count every 2 hour intervals.
I would really appreciate the guidance and support as I continue to learning RStudio.
I have tried using rolling averages, but the method uses only the Particle Count Column in which it takes the average for row 1-12, 2-13,3,-14, etc. Instead what I needed to see is the average of rows 1-12, 13-24, 25-36, etc.
I've just created a variable the way you wanted, as named rollavg.
library(tidyverse)
df %>%
group_by(hour) %>%
mutate(rollavg =
split(ParticleCount, (seq_along(~.)-1) %/% 12) %>%
map_dbl(mean))
Related
Revenue month chart of 2011
I am supposed to make line chart to analyse the revenue for each month in a particular year. So what should be done other than this? Is this one right?
The Revenue column is the problem. I believe you want a continuous measure (Sum, Max, Avg etc) while you are now using a continuous dimension.
Remove the Revenue from rows and bring a Sum(Revenue) or similar.
Wrong chart type, and (perhaps) data aggregation issue - you want a bar chart by month, not a line-chart by day.
In my dataset as follows (two columns: DATE and RATE)
I want to get the mean for the RATE for each day (from the dataset, you can see that there are multiple rate values for the same day). I have about 1,000 rows, so that I am trying to find an easier way to calculate the mean for each day, then save the results to a data frame.
You have to group by date then aggregate
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html
In your case
df.groupby('DATE').agg({'RATE': ['mean']})
You can groupby the date and perform mean operation.
new_df = df.groupby('DATE').mean()
I have an hourly (timestamp) dataset of events from the past month.
I would like to check the performance of events that occurred between certain hours, group them together and average the results.
For example: AVG income of the hours 23:00-02:00 per user:
So if I have this data set below. I'd like to summarise the coloured rows and then average them (the result should be 218).
I tried NTILE but it couldn't divide the data properly, ignoring the irrelevant hours.
Is there a good way to create these custom buckets using SQL?
dataset
From description not exactly sure how you want to aggregate. If you provide an example dataset can update answer.
However you can easily achieve this with an AVG and IF statement.
AVG(IF(EXTRACT(HOUR FROM timestamp_field) BETWEEN 0 AND 4, value, NULL) as avg_value
Using the above you can then group by either day or month to get the aggregation level you want.
I have a dataframe which contains data regarding CO2 levels over time and has two key columns: Year and ppm. Year goes from 1974 to 2019, and there are several rows for each year. So for example 1974 starts with a ppm of 333.34, and very next row is 1974 with a ppm of a slightly different ppm. There's 2000+ rows total. I want to get the average ppm for each year and plot for each single year.
I'm trying to figure out the best way to do this. Right now some things I've considered:
df_Year = df.loc[df['Year']==1975]
which would isolate all of the 1975 rows, then use
df_Year['ppm'].astype("float").mean(axis=0)
and I could then get the average that way, but that's just one year. I am thinking I could make a loop which iterates through each year and gets the average and then assigns the average ppm to a list or dictionary or something.
But it just seems kind of lengthy. Isn't there a more efficient way?
I am struggling with a DAX pattern to allow me to plot an average duration value on a chart.
Here is the problem: My dataset has a field called dtOpened which is a date value describing when something started, and I want to be able to calculate the duration in days since that date.
I then want to be able to create an average duration since that date over a time period.
It is very easy to do when thinking about the value as it is now, but I want to be able to show a chart that describes what that average value would have been over various time periods on the x-axis (month/quarter/year).
The problem that I am facing is that if I create a calculated column to find the current age (NOW() - [dtOpened]), then it always uses the NOW() function - which is no use for historic time spans. Maybe I need a Measure for this, rather than a calculated column, but I cannot work out how to do it.
I have thought about using LASTDATE (rather than NOW) to work out what the last date would be in the filter context of any single month/quarter/year, but if the current month is only half way through, then it would probably need to consider today's date as the value from which to subtract the dtOpened value.
I would appreciate any help or pointers that you can give me!
It looks like you have a table (let's call it Cases) storing your cases with one record per case with fields like the following:
casename, dtOpened, OpenClosedFlag
You should create a date table with on record per day spanning your date range. The date table will have a month ending date field identifying the last day of the month (same for quarter & year). But this will be a disconnected date table. Don't create a relationship between the Date on the Date table and your case open date.
Then use iterative averagex to average the date differences.
Average Duration (days) :=
CALCULATE (
AVERAGEX ( Cases, MAX ( DateTable[Month Ending] ) - Cases[dtopened] ),
FILTER ( Cases, Cases[OpenClosedFlag] = "Open" ),
FILTER ( Cases, Cases[dtopened] <= MAX ( DateTable[Month Ending] ) )
)
Once you plot the measure against your Month you should see the average values represented correctly. You can do something similar for quarter & year.
You're a genius, Rory; Thanks.
In my example, I had a dtClosed field rather than an Opened/Closed flag, so there was one extra piece of filtering to do to test if the Case was closed at that point in time. So my measure ended up looking like this:
Average Duration:=CALCULATE(
AVERAGEX(CasesOnly, MAX(DT[LastDateM]) - CasesOnly[Owner Opened dtOnly]),
FILTER(CasesOnly, OR(ISBLANK(CasesOnly[Owner Resolution dtOnly]),
CasesOnly[Owner Resolution dtOnly] > MAX(DT[LastDateM]))),
FILTER(CasesOnly, CasesOnly[Owner Opened dtOnly] <= MAX(DT[LastDateM]))
)
And to get the chart, I plotted the DT[Date] field on the x-axis.
Thanks very much again.