Plotting grouped data in R (both numeric and categorical variables in X axis) - ggplot2

super new to R here. I'm trying to plot a graph to visualise my aggregate group data (consists of numeric and categorical data). Please help anyone!
DD %>%
select(Age_start_treatment, Skeletal_AP, Sex, Treatment_time) %>%
group_by(Age_start_treatment, Skeletal_AP, Sex) %>%
summarize(avg_total_treatment_time = mean(Treatment_time, na.rm=TRUE)) %>%
Unable to figure out the next step for the life of me but know I require the use of ggplot().
I need the best chart to plot the patients' age, skeletal class dimension (I,II or III) and sex against the total treatment time
Thanks

Related

Filtering and calculating mean within groups ggplot2

I'm working with a large df trying to make some plots by filterig data through different attributes of interest. Let's say my df looks like:
df(site=c(A,B,C,D,E), subsite=c(w,x,y,z), date=c(01/01/1985, 05/01/1985, 16/03/1995, 24/03/1995), species=c(1,2,3,4), Year=c(1985,1990,1995,2012), julian day=c(1,2,3,4), Month=c(6,7,8,11).
I would like plot the average julian day per month each year in which a species was present in a Subsite and Site. So far I've got this code but the average has been calculated for each month over all the years in my df rather than per year. Any help/ directions would be welcome!
Plot1<- df %>%
filter(Site=="A", Year>1985, Species =="2")%>%
group_by(Month) %>%
mutate("Day" = mean(julian day)) %>%
ggplot(aes(x=Year, y=Day, color=Species)) +
geom_boxplot() +
stat_summary(fun=mean, geom="point",
shape=1, size=1, show.legend=FALSE) +
stat_summary(fun=mean, colour="red", geom="text", show.legend = FALSE,
vjust=-0.7,size=3, aes(label=round(..y.., digits=0)))
Thanks!
I think I spotted the error.
I was missing this:
group_by(Month, **Year**) %>%

Grouping the factors in ggplot

I am trying to create a graph based on matrix similar to one below... I am trying to group the Erosion values based on "Slope"...
library(ggplot2)
new_mat<-matrix(,nrow = 135, ncol = 7)
colnames(new_mat)<-c("Scenario","Runoff (mm)","Erosion (t/ac)","Slope","Soil","Tillage","Rotation")
for ( i in 1:nrow(new_mat)){
new_mat[i,2]<-sample(10:50, 1)
new_mat[i,3]<-sample(0.1:20, 1)
new_mat[i,4]<-sample(c("S2","S3","S4","S5","S1"),1)
new_mat[i,5]<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,7]<-sample(c("WBP","WBF","WF"),1)
new_mat[i,6]<-sample(c("Intense","Reduced","Notill"),1)
new_mat[i,1]<-paste0(new_mat[i,4],"_",new_mat[i,5],"_",new_mat[i,6],"_",new_mat[i,7],"_")
}
#### Graph part ########
grphs_mat<-as.data.frame(new_mat)
grphs_mat$`Runoff (mm)`<-as.numeric(as.character(grphs_mat$`Runoff (mm)`))
grphs_mat$`Erosion (t/ac)`<-as.numeric(as.character(grphs_mat$`Erosion (t/ac)`))
ggplot(grphs_mat, aes(Scenario, `Erosion (t/ac)`,group=Slope, colour = Slope))+
scale_y_continuous(limits=c(0,max(as.numeric((grphs_mat$`Erosion (t/ac)`)))))+
geom_point()+geom_line()
But when i run this code.. The values are distributed in x-axis for all 135 scenarios. But what i want is grouping to be done in terms of slope but it also picks up the other common factors such as Soil+Rotation+Tillage and place it in x-axis. For example:
For these five scenarios:
S1_Deep_Intense_WBF_
S2_Deep_Intense_WBF_
S3_Deep_Intense_WBF_
S4_Deep_Intense_WBF_
S5_Deep_Intense_WBF_
It separates the S1, S2, S3,S4,S5 but also be able to know that other factors are same and put them in x-axis such that the slope lines are stacked on top of each other in 135/5 = 27 x-axis points. The final figure should look like this (Refer image). Apologies for not being able to explain it better.
I think i am making a mistake in grouping or assigning the x-axis values.
I will appreciate your suggestions.
In the example you give, I didn't get every possible factor combination represented so the plots looked a bit weird. What I did instead was start with the following:
set.seed(42)
new_mat <- matrix(,nrow = 1000, ncol = 7)
And then deduplicated this by summarising the values. A possible relevant step here for you analysis is that I made new variable with the interaction() function that is the combination of three other factors.
library(tidyverse)
df <- grphs_mat
df$x <- with(df, interaction(Rotation, Soil, Tillage))
# The simulation did not yield unique combinations
df <- df %>% group_by(x, Slope) %>%
summarise(n = sum(`Erosion (t/ac)`))
Next, I plotted this new x variable on the x-axis and used "stack" positions for the lines and points.
g <- ggplot(df, aes(x, y = n, colour = Slope, group = Slope)) +
geom_line(position = "stack") +
geom_point(position = "stack")
To make the x-axis slightly more readable, you can replace the . that the interaction() function placed by newlines.
g + scale_x_discrete(labels = function(x){gsub("\\.", "\n", x)})
Another option is to simply rotate the x axis labels:
g + theme(axis.text.x.bottom = element_text(angle = 90))
There are a few additional options for the x-axis if you go into ggplot2 extension packages.

How to visualize 'suicides_no' w.r.t 'gdp_per_capita ($)' for a given country over the years, in the following data frame

The DataFrame can be viewed here: Global Suicide Dataset
I have made a pivot table with country and year as indices using the following code:
df1 = pd.pivot_table(df, index = ['country', 'year'],
values=['suicides_no','gdp_per_capita ($)', 'population', 'suicides/100k pop'],
aggfunc = {"suicides_no" : np.sum
,"gdp_per_capita ($)" : np.mean
,"population" : np.mean
,"suicides/100k pop" : np.mean})
Output:
Now for my project, i want to visualize how does the suicides_no vary with the gdp_per_capita for a country over the years. But I am unable to plot it. Can somebody please help me out?
First lets convert indexes to columns using df1.reset_index(inplace=True)
Now, you can draw this in a scatter plot where the main features are - Year (preferably on x-axis) and suicides_no (on y-axis). The gdp_per_capita will go as size of the dots.
In this case you have two options:
Draw different plots for each country. (gdp will be shown as hue)
sns.catplot(x='year', y='suicides_no', row='country', hue='gdp_per_capita ($)', data=df1)
Draw everything in a single plot. Scatter plot with GDP as dot size, and Country as Color (hue)
sns.scatterplot(x='year', y='suicides_no', hue='country', size='gdp_per_capita ($)', data=df1)

How to plot a stacked bar using the groupby data from the dataframe in python?

I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.
from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)

How to add two boxplots in a same graph in ggplot2

I have this sample data.
sample <- data.frame(sample = 1:12,
site = c('A','A','A','B','B','B','A','A','A','B','B','B'),
month = c(rep('Feb', 6), rep('Aug', 6)),
Ar = c(7,8,9,8,9,9,4,5,7,5,8,9))
And created two boxplots
ggplot(sample, aes(x=factor(month), y=Ar)) +
geom_boxplot(aes(fill=site))
ggplot(sample, aes(x=factor(month), y=Ar)) +
geom_boxplot()
I wonder if there is a way to combine them in the same graph so that total, site A and site B are right next to each other per each month.
You could utilize dplyr (via the tidyverse package) and reshape2.
library(dplyr)
library(reshape2)
sample%>%
dplyr::select(-sample) %>%
mutate(global = 'Global') %>%
melt(., id.vars=c("month", "Ar")) %>%
ggplot(aes(month, Ar)) + geom_boxplot(aes(month, Ar, fill=value))
This drops the sample column as you aren't currently using it, adds the term global in a separate column, reshapes the data via the melt function and generates a figure. Note that I changed the input code format in your original question. With the changes to the data.frame you no longer need to coerce the variables to factors.