ggplot sf variable factor with specific point colour, and size, with sf geometry - ggplot2

i have a shp file , with lat lon
( shp_4283 <- sf::st_transform(shp, crs = 4283) )
and 3 variables, of which i would like to plot the separate
$Substrate factors to separate colours and to their $geometry locations.
with geom_sf..
ggplot() +
geom_sf(data = subset(shp_4283, Substrate == "Sand", show.legend = "point"), #aes(shape = YOU),
color = "yellow", size = 1) +
geom_sf(data = subset(shp_4283, Substrate == "Mixed reef and sand", show.legend = "point"), #aes(shape = YOU),
color = "green", size = 1) +
geom_sf(data = subset(shp_4283, Substrate == "None modelled with certainty", show.legend = "point"), #aes(shape = YOU),
color = "grey", size = 2) +
geom_sf(data = subset(shp_4283, Substrate == "Reef", show.legend = "point"), #aes(shape = YOU),
color = "black", size = 2, show.legend = T) +
coord_sf()
the plot works, but with no legend as no aes() set.. but then further errors occur due to "Error in x[j] : invalid subscript type 'list'"
I understand to create a new df filtering each factor and its geometry to then plot from..
df <- shp_4283 %>%
# Your filter
filter(Substrate == "Reef") %>%
# 2 Extract coordinates
st_coordinates() %>%
# 3 to table /tibble
as.data.frame() %>%
**is this where i would code the 'column names' so that each
filtered $Substrate factor in a new df would be labelled appropriately?**
but is there a geom_.. way to plot separate variable factor from the sf df with its geometry.. and the legend mapping the color to the factor?

geom_sf() & subset() & explicit layer call
to view the smallest data-group last, on top of the progressively larger data-groups beneath
Col = c('green','grey','black','yellow') #define data-group colours
jr_map <- ggplot() +
geom_sf(data = shp_4283, aes(color = Substrate)) + # to map all data-groups to the plot, and more importantly, the legend.. legend appears now.. but not correct color-mapping
# individually map the separate groups, by factor, add point size, and legend visibility
geom_sf(data = subset(shp_4283, Substrate == "Sand", show.legend = "point"), color = "yellow", size = 1) +
geom_sf(data = subset(shp_4283, Substrate == "Mixed reef and sand", show.legend = "point"), color = "green", size = 1) +
geom_sf(data = subset(shp_4283, Substrate == "None modelled with certainty", show.legend = "point"), color = "grey", size = 2) +
geom_sf(data = subset(shp_4283, Substrate == "Reef", show.legend = "point"), color = "black", size = 2) +
# retrieve geometry
coord_sf() +
# add label for map projection/CRS
annotate(geom = "text", x = 115.095:115.1, y = -33.62:-33.6, label = "crs = 4283",
fontface = "italic", color = "grey22", size = 2.5) +
#geom_text(data=bb_df, aes(x=115,y=35), fill='black', color='black', alpha=.1, angle=35) + # incase an on-map descriptive label needed
# add north arrow
annotation_north_arrow(location = "bl", which_north = "true",
pad_x = unit(0.25, "cm"),
pad_y = unit(0.25, "cm"),
height = unit(0.6, "cm"), width = unit(0.6, "cm"),
style = north_arrow_fancy_orienteering) +
theme_bw() + # remove 'azure' ocean color
ggtitle("Jolly-Rog Drops" , subtitle = "Geographe Bay, Western Australia") +
xlab("Longitude") +
ylab("Latitude") +
## Legend : CUSTOM
#Title, colors (Col specified manually at start..ordered according to $Substrate listing),
scale_color_manual(name = "Substrate", values = Col) +
guides(color = guide_legend(override.aes = list(shape = c(2,2,2,2), size = 1, fill = Col), nrow = 2, byrow = TRUE, legend.text = element_text(size=0.2)) ) + # size and shape NOT WORKING!? HELP..
#theme(legend.position = c(0, 0))
theme(legend.position = "bottom")
jr_map
also thanks to this post (geom_ explanation)
the different variable factors ($Substrate) are 1. geo-plotted, and their 2. respective visibility, is easily adjustable, dependent on the layer at which they are coded from initial ggplot() call (farthest/last most prominent). The geom_sf() points (not geom_point()) are then made prominent through colour, shape, and size of point (alpha = also available).

Related

how to color different datasets separately when overlapping them using geom_smooth and color settings

i have 2 datasets that span full genomes, separated by chromosomes (scaffolds), for 2 group comparisons and i want to overlap them in a single graph.
the way i was doing was as follow:
ggplot(NULL, aes(color = as_factor(scaffold))) +
geom_smooth(data = windowStats_SBvsOC, aes(x = mid2, y = Fst_group1_group5), se=F) +
geom_smooth(data = windowStats_SCLvsSCU, aes(x = mid2, y = Fst_group3_group4), se=F) +
scale_y_continuous(expand = c(0,0), limits = c(0, 1)) +
scale_x_continuous(labels = chrom$chrID, breaks = axis_set$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(chrom$chrID)))) +
scale_size_continuous(range = c(0.5,3)) +
labs(x = NULL,
y = "Fst (smoothed means)") +
theme_minimal() +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.y = element_text(),
axis.text.x = element_text(angle = 60, size = 8, vjust = 0.5))
this way, i get each chromosome with alternating colors, and the smoothing is per chromosome. but i wanted the colors to be different between the 2 groups so i can distinguish when they are overlapped like this. is there a way to do it? i can only do it once i remove the color by scaffold, but then the smoothing gets done across the whole genome and i don't want that!
my dataset is big, so i'm attaching it here!
i'm running this in rstudio 2022.02.3, R v.3.6.2 and package ggplot2
EDIT: i've figured out! i just needed to change color = as_factor(scaffold) to group = as_factor(scaffold); and then add the aes(color) to each geom_smooth() function.

What is Julia's equivalent ggplot code of R's?

I would like to plot a sophisticated graph in Julia. The code below is in Julia's version using ggplot.
using CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs, RCall
#rlibrary ggplot2
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
sex=repeat(["male", "female"], inner=8),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
mod_uncentered = lm(#formula(weight ~ 1 + sex * age), growthdata)
refgrid = copy(growthdata)
filter!(refgrid) do row
return mod(row.age, 2) == (row.sex == "male")
end
effects!(refgrid, mod_uncentered)
refgrid[!, :lower] = #. refgrid.weight - 1.96 * refgrid.err
refgrid[!, :upper] = #. refgrid.weight + 1.96 * refgrid.err
df= refgrid
ggplot(df, aes(x=:age, y=:weight, group = :sex, shape= :sex, linetype=:sex)) +
geom_point(position=position_dodge(width=0.15)) +
geom_ribbon(aes(ymin=:lower, ymax=:upper), fill="gray", alpha=0.5)+
geom_line(position=position_dodge(width=0.15)) +
ylab("Weight")+ xlab("Age")+
theme_classic()
However, I would like to modify this graph a bit more. For example, I would like to change the scale of the y axis, the colors of the ribbon, add some error bars, and also change the text size of the legend and so on. Since I am new to Julia, I am not succeding in finding the equivalent language code for these modifications. Could someone help me translate this R code below of ggplot into Julia's language?
t1= filter(df, sex=="male") %>% slice_max(df$weight)
ggplot(df, aes(age, weight, group = sex, shape= sex, linetype=sex,fill=sex, colour=sex)) +
geom_line(position=position_dodge(width=0.15)) +
geom_point(position=position_dodge(width=0.15)) +
geom_errorbar(aes(ymin = lower, ymax = upper),width = 0.1,
linetype = "solid",position=position_dodge(width=0.15))+
geom_ribbon(aes(ymin = lower, ymax = upper, fill = sex, colour = sex), alpha = 0.2) +
geom_text(data = t1, aes(age, weight, label = round(weight, 1)), hjust = -0.25, size=7,show_guide = FALSE) +
scale_y_continuous(limits = c(70, 150), breaks = seq(80, 140, by = 20))+
theme_classic()+
scale_colour_manual(values = c("orange", "blue")) +
guides(color = guide_legend(override.aes = list(linetype = c('dotted', 'dashed'))),
linetype = "none")+
xlab("Age")+ ylab("Average marginal effects") + ggtitle("Title") +
theme(
axis.title.y = element_text(color="Black", size=28, face="bold", hjust = 0.9),
axis.text.y = element_text(face="bold", color="black", size=16),
plot.title = element_text(hjust = 0.5, color="Black", size=28, face="bold"),
legend.title = element_text(color = "Black", size = 13),
legend.text = element_text(color = "Black", size = 16),
legend.position="bottom",
axis.text.x = element_text(face="bold", color="black", size=11),
strip.text = element_text(face= "bold", size=15)
)
As I commented before, you can use R-strings to run R code. To be clear, this isn't like your post's approach where you piece together many Julia objects that wrap many R objects, this is RCall converting a Julia Dataframe to an R dataframe then running your R code.
Running an R script may not seem very Julian, but code reuse is very Julian. Besides, you're still using an R library and active R session either way, and there might even be a slight performance benefit from reducing how often you make wrapper objects and switch between Julia and R.
## import libraries for Julia and R; still good to do at top
using CairoMakie, DataFrames, Effects, GLM, StatsModels, StableRNGs, RCall
R"""
library(ggplot2)
library(dplyr)
"""
## your Julia code without the #rlibrary or ggplot lines
rng = StableRNG(42)
growthdata = DataFrame(; age=[13:20; 13:20],
sex=repeat(["male", "female"], inner=8),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(rng, 16))
mod_uncentered = lm(#formula(weight ~ 1 + sex * age), growthdata)
refgrid = copy(growthdata)
filter!(refgrid) do row
return mod(row.age, 2) == (row.sex == "male")
end
effects!(refgrid, mod_uncentered)
refgrid[!, :lower] = #. refgrid.weight - 1.96 * refgrid.err
refgrid[!, :upper] = #. refgrid.weight + 1.96 * refgrid.err
df= refgrid
## convert Julia's df and run your R code in R-string
## - note that $df is interpolation of Julia's df into R-string,
## not R's $ operator like in rdf$weight
## - call the R dataframe rdf because df is already an R function
R"""
rdf <- $df
t1= filter(rdf, sex=="male") %>% slice_max(rdf$weight)
ggplot(rdf, aes(age, weight, group = sex, shape= sex, linetype=sex,fill=sex, colour=sex)) +
geom_line(position=position_dodge(width=0.15)) +
geom_point(position=position_dodge(width=0.15)) +
geom_errorbar(aes(ymin = lower, ymax = upper),width = 0.1,
linetype = "solid",position=position_dodge(width=0.15))+
geom_ribbon(aes(ymin = lower, ymax = upper, fill = sex, colour = sex), alpha = 0.2) +
geom_text(data = t1, aes(age, weight, label = round(weight, 1)), hjust = -0.25, size=7,show_guide = FALSE) +
scale_y_continuous(limits = c(70, 150), breaks = seq(80, 140, by = 20))+
theme_classic()+
scale_colour_manual(values = c("orange", "blue")) +
guides(color = guide_legend(override.aes = list(linetype = c('dotted', 'dashed'))),
linetype = "none")+
xlab("Age")+ ylab("Average marginal effects") + ggtitle("Title") +
theme(
axis.title.y = element_text(color="Black", size=28, face="bold", hjust = 0.9),
axis.text.y = element_text(face="bold", color="black", size=16),
plot.title = element_text(hjust = 0.5, color="Black", size=28, face="bold"),
legend.title = element_text(color = "Black", size = 13),
legend.text = element_text(color = "Black", size = 16),
legend.position="bottom",
axis.text.x = element_text(face="bold", color="black", size=11),
strip.text = element_text(face= "bold", size=15)
)
"""
The result is the same as your post's R code:
I used Vega-Lite (https://github.com/queryverse/VegaLite.jl) which is also grounded in the "Grammar of Graphics", and LinearRegression (https://github.com/ericqu/LinearRegression.jl) which provides similar features as GLM, although I think it is possible to get comparable results with the other plotting and linear regression packages. Nevertheless, I hope that this gives you a starting point.
using LinearRegression: Distributions, DataFrames, CategoricalArrays
using DataFrames, StatsModels, LinearRegression
using VegaLite
growthdata = DataFrame(; age=[13:20; 13:20],
sex=categorical(repeat(["male", "female"], inner=8), compress=true),
weight=[range(100, 155; length=8); range(100, 125; length=8)] .+ randn(16))
lm = regress(#formula(weight ~ 1 + sex * age), growthdata)
results = predict_in_sample(lm, growthdata, req_stats="all")
fp = select(results, [:age, :weight, :sex, :uclp, :lclp, :predicted]) |> #vlplot() +
#vlplot(
mark = :errorband, color = :sex,
y = { field = :uclp, type = :quantitative, title="Average marginal effects"},
y2 = { field = :lclp, type = :quantitative },
x = {:age, type = :quantitative} ) +
#vlplot(
mark = :line, color = :sex,
x = {:age, type = :quantitative},
y = {:predicted, type = :quantitative}) +
#vlplot(
:point, color=:sex ,
x = {:age, type = :quantitative, axis = {grid = false}, scale = {zero = false}},
y = {:weight, type = :quantitative, axis = {grid = false}, scale = {zero = false}},
title = "Title", width = 400 , height = 400
)
which gives:
You can change the style of the elements by changing the "config" as indicated here (https://www.queryverse.org/VegaLite.jl/stable/gettingstarted/tutorial/#Config-1).
As the Julia Vega-Lite is a wrapper to Vega-Lite additional documentation can be found on the Vega-lite website (https://vega.github.io/vega-lite/)

Problem with alignment of geom_point and geom_errorbar

I am trying to plot how different predictors associate with stroke and underlying phenotypes (i.e. cholesterol). In my data, I originally had working ggplot code in which shapes denoted the different variables (stroke, HDL cholesterol and total cholesterol) and colour denoted type (i.e. disease (stroke) or phenotype (HDL/total cholesterol). To make it more intuitive, I want to swap shape and colour around but now that I do this, I am having issues with position dodge and the alignment of geom_point and geom_error
stroke_graph <- ggplot(stroke,aes(y=as.numeric(stroke$test),
x=Clock,
shape = Type,
colour = Variable)) +
geom_point(data=stroke, aes(shape=Type, colour=Variable), show.legend=TRUE,
position=position_dodge(width=0.5), size = 3) +
geom_errorbar(aes(ymin = as.numeric(stroke$LCI), ymax= as.numeric(stroke$UCI)),
position = position_dodge(0.5), width = 0.05,
colour ="black")+
ylab("standardised beta/log odds")+ xlab ("")+
geom_hline(yintercept = 0, linetype = "dotted")+
theme(axis.text.x = element_text(size = 10, vjust = 0.5), legend.position = "none",
plot.title = element_text(size = 12))+
scale_y_continuous(limit = c(-0.402, 0.7))+ scale_shape_manual(values=c(15, 17, 18))+
theme(legend.position="right") + labs(shape = "Variable") + guides(shape = guide_legend(reverse=TRUE)) +
coord_flip()
stroke_graph + ggtitle("Stroke and Associated Phenotypes") + theme(plot.title = element_text(hjust = 0.5))
Graph now: 1
Previously working graph - only difference in code is swapping "Type" and "Variable": 2

ggplot2: How to move y axis labels right next to the bars

I am working with following reproducible dataset:
df<- data.frame(name=c(letters[1:10],letters[1:10]),fc=runif(20,-5,5)
,fdr=runif(20),group=c(rep("gene",10),rep("protein",10)))
Code used to plot:
df$sig<- ifelse(df$fdr<0.05 & df$fdr>0 ,"*","")
ggplot(df, aes(x=reorder(name,fc),fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(label = sig),angle = 90, position = position_stack(vjust = -0.2), color= "black",size=3)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_text(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Resulting in following plot:
Instead of having the y-axis labels on left side, I would like to place them right next to the bars. I want to emulate this chart published in nature:
https://www.nature.com/articles/ncomms2112/figures/3
Like this?
df<- data.frame(name=c(letters[1:10],letters[1:10]),fc=runif(20,-5,5)
,fdr=runif(20),group=c(rep("gene",10),rep("protein",10)))
df$sig<- ifelse(df$fdr<0.05 & df$fdr>0 ,"*","")
df$try<-c(1:10,1:10) #assign numbers to letters
x_pos<-ifelse(df$group=='gene',df$try-.2,df$try+.2) #align letters over bars
y_posneg<-ifelse(df$fc>0,df$fc+.5,df$fc-.5) #set up y axis position of letters
ggplot(df, aes(x=try,fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(y=y_posneg,x=x_pos,label = name),color= "black",size=6)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Gives:
Or perhaps this?
x_pos<-ifelse(df$group=='gene',df$try-.2,df$try+.2) #align letters over bars
y_pos<-ifelse(df$fc>0,-.2,.2) #set up y axis position of letters
ggplot(df, aes(x=try,fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(y=y_pos,x=x_pos,label = name),color= "black",size=3)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Gives:

Is it possible to have 2 legends for variables when one is continuous and the other is discrete?

I checked a few examples online and I am not sure that it can be done because every plot with 2 different variables (continuous and discrete) has one of 2 options:
legend regarding the continuous variable
legend regarding the discrete variable
Just for visualization, I put here an example. Imagine that I want to have a legend for the blue line. Is it possible to do that??
The easiest approach would be to map it to a different aesthetic than you already use:
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_smooth(method = "loess", aes(linetype = "fit"))
There area also specialised packages for adding additional colour legends:
library(ggplot2)
library(ggnewscale)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit"))
Beware that if you want to tweak colours via a colourscale, you must first add these before calling the new_scale_colour(), i.e.:
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
scale_colour_manual(values = c("red", "green", "blue")) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit")) +
scale_colour_manual(values = "purple")
EDIT: To adress comment: yes it is possible with a line that is data independent, I was just re-using the data for brevity of example. See below for arbitrary line (also should work with the ggnewscale approach):
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_line(data = data.frame(x = 1:30, y = rnorm(10, 200, 10)),
aes(x, y, linetype = "arbitrary line"))