Filtering and calculating mean within groups ggplot2 - ggplot2

I'm working with a large df trying to make some plots by filterig data through different attributes of interest. Let's say my df looks like:
df(site=c(A,B,C,D,E), subsite=c(w,x,y,z), date=c(01/01/1985, 05/01/1985, 16/03/1995, 24/03/1995), species=c(1,2,3,4), Year=c(1985,1990,1995,2012), julian day=c(1,2,3,4), Month=c(6,7,8,11).
I would like plot the average julian day per month each year in which a species was present in a Subsite and Site. So far I've got this code but the average has been calculated for each month over all the years in my df rather than per year. Any help/ directions would be welcome!
Plot1<- df %>%
filter(Site=="A", Year>1985, Species =="2")%>%
group_by(Month) %>%
mutate("Day" = mean(julian day)) %>%
ggplot(aes(x=Year, y=Day, color=Species)) +
geom_boxplot() +
stat_summary(fun=mean, geom="point",
shape=1, size=1, show.legend=FALSE) +
stat_summary(fun=mean, colour="red", geom="text", show.legend = FALSE,
vjust=-0.7,size=3, aes(label=round(..y.., digits=0)))
Thanks!

I think I spotted the error.
I was missing this:
group_by(Month, **Year**) %>%

Related

Plotting grouped data in R (both numeric and categorical variables in X axis)

super new to R here. I'm trying to plot a graph to visualise my aggregate group data (consists of numeric and categorical data). Please help anyone!
DD %>%
select(Age_start_treatment, Skeletal_AP, Sex, Treatment_time) %>%
group_by(Age_start_treatment, Skeletal_AP, Sex) %>%
summarize(avg_total_treatment_time = mean(Treatment_time, na.rm=TRUE)) %>%
Unable to figure out the next step for the life of me but know I require the use of ggplot().
I need the best chart to plot the patients' age, skeletal class dimension (I,II or III) and sex against the total treatment time
Thanks

forecasting in time series in R with dplyr and ggplot

Hope all goes well.
I have a data set that I can share a small piece of it:
date=c("2022-08-01","2022-08-02","2022-08-03","2022-08-04",
"2022-08-05","2022-08-6")
sold_items=c(12,18,9,31,19,10)
df <- data.frame(date=as.Date(date),sold_items)
df %>% sample_n(5)
date sold_items
1 2022-08-04 31
2 2022-08-03 9
3 2022-08-01 12
4 2022-08-06 10
5 2022-08-02 18
I need to forecast the number of sold items in the next two weeks (14 days after the last available date in the data).
And also need to show the forecasted data along with the current data on one graph using ggplot
I have been looking into forecast package to use ARIMA but I am lost and could not convert this data to a time series object.
I wonder if someone can provide a solution with dplyr to my problem.
Thank you very much.
# first create df
` df =
tibble(
sold = c(12, 18, 9, 31, 19, 10),
date = seq(as.Date("2022-08-01"),
by = "day",
length = length(sold))) %>%
relocate(date)
#then coerce to a tsibble object (requires package fpp3) and model:
df %>%
as_tsibble(index = date) %>%
model(ARIMA(sold)) %>%
forecast(h = 14)

Grouping boxplot together

I have the following data and would like to create a group boxplot. I created a bargraph in excel and would like to create the boxplot in the exact same way using R (see bargraph here). I tried using ggplot but was unsuccessful. Any help will be appreciated. Thank you.
Fruit spring summer fall
Banana 19.36 91.51 49.99
Apple 65.27 51.55 42.83
orange 16.21 94.71 62.33
It's not clear what exactly you are looking for. To have the bar graph, you can do something like this:
library(tidyverse)
df %>%
pivot_longer(!Fruit) %>%
ggplot(aes(x = Fruit, y = value, fill = name)) +
geom_bar(position="dodge", stat="identity")
Output
As far as a boxplot, you would need more than 1 data point, but if you wanted to do a box plot by Fruit, then you could do something like this:
df %>%
pivot_longer(!Fruit) %>%
ggplot(aes(x = Fruit, y = value, fill = Fruit)) +
geom_boxplot()
Output

Plotting dates by weekdays and groups

I would like to compare the values from different weeks in different groups. Something like daily sales for two team members by week to demonstrate the effect of one person being off/a holiday etc. The time of the sale within each day needs to be ordered within the day but the x axis should be labeled by day.
Example is arbitrary.
Example data and output
stringsAsFactors =FALSE
library(lubridate)
library(tidyverse)
library(magrittr)
#=======================
# Week on week comparison of days by a group
#=======================
# Generate DF
Date <- data.frame(Date = rep(seq(as.Date("2020-04-01"),as.Date("2020-04-14"),by="days"),4))
Time <- data.frame(Time = c(rep("00:00:01",nrow(Date)/2),rep("00:00:02",nrow(Date)/2)))
Type <- data.frame(Type = rep(c(rep("a",nrow(Date)/4),rep("b",nrow(Date)/4)),2))
df <- cbind(Date,Time,Type)
# Add random values to plot
df %<>% mutate(values = runif(nrow(.),1,10))
# Create a groups for weeks, orders for days and labels as weekdays (char strings).
df %<>% mutate(weekLevel = week(Date),
dayLevel = wday(Date),
Day = as.character(weekdays(Date)),
orderVar = paste0(dayLevel, Time))
ggplot(df %>% arrange(orderVar), aes(x = orderVar, y = values,group = interaction(Type,weekLevel),colour=Type))+
geom_line()+
scale_x_discrete(breaks =df$orderVar , labels = df$Day) +
theme(axis.text.x = element_text(angle = 90, hjust=1))
This works but the day is repeated because the breaks are set to a more granular level than the labels. It also feels a bit hacky.
Any and all feedback is appreciate :)

How to add two boxplots in a same graph in ggplot2

I have this sample data.
sample <- data.frame(sample = 1:12,
site = c('A','A','A','B','B','B','A','A','A','B','B','B'),
month = c(rep('Feb', 6), rep('Aug', 6)),
Ar = c(7,8,9,8,9,9,4,5,7,5,8,9))
And created two boxplots
ggplot(sample, aes(x=factor(month), y=Ar)) +
geom_boxplot(aes(fill=site))
ggplot(sample, aes(x=factor(month), y=Ar)) +
geom_boxplot()
I wonder if there is a way to combine them in the same graph so that total, site A and site B are right next to each other per each month.
You could utilize dplyr (via the tidyverse package) and reshape2.
library(dplyr)
library(reshape2)
sample%>%
dplyr::select(-sample) %>%
mutate(global = 'Global') %>%
melt(., id.vars=c("month", "Ar")) %>%
ggplot(aes(month, Ar)) + geom_boxplot(aes(month, Ar, fill=value))
This drops the sample column as you aren't currently using it, adds the term global in a separate column, reshapes the data via the melt function and generates a figure. Note that I changed the input code format in your original question. With the changes to the data.frame you no longer need to coerce the variables to factors.