Add sample size to a panel figure of boxplots - ggplot2

I am trying to add sample size to boxplots (preferably at the top or bottom of them) that are grouped by two levels. I used the facet_grid() function to produce a panel plot. I then tried to use the annotate() function to add the sample sizes, however this couldn't work because it repeated the values in the second panel. Is there a simple way to do this?
head(FeatherData, n=10)
Location Status FeatherD Species ID
## 1 TX Resident -27.41495 Carolina wren CARW (32)
## 2 TX Resident -29.17626 Carolina wren CARW (32)
## 3 TX Resident -31.08070 Carolina wren CARW (32)
## 4 TX Migrant -169.19579 Yellow-rumped warbler YRWA (28)
## 5 TX Migrant -170.42079 Yellow-rumped warbler YRWA (28)
## 6 TX Migrant -158.66925 Yellow-rumped warbler YRWA (28)
## 7 TX Migrant -165.55278 Yellow-rumped warbler YRWA (28)
## 8 TX Migrant -170.43374 Yellow-rumped warbler YRWA (28)
## 9 TX Migrant -170.21801 Yellow-rumped warbler YRWA (28)
## 10 TX Migrant -184.45871 Yellow-rumped warbler YRWA (28)
ggplot(FeatherData, aes(x = Location, y = FeatherD)) +
geom_boxplot(alpha = 0.7, fill='#A4A4A4') +
scale_y_continuous() +
scale_x_discrete(name = "Location") +
theme_bw() +
theme(plot.title = element_text(size = 20, family = "Times", face =
"bold"),
text = element_text(size = 20, family = "Times"),
axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 15)) +
ylab(expression(Feather~delta^2~H["f"]~"‰")) +
facet_grid(. ~ Status)

There's multiple ways to do this sort of task. The most flexible way is to compute your statistic outside the plotting call as a separate dataframe and use it as its own layer:
library(dplyr)
library(ggplot2)
cw_summary <- ChickWeight %>%
group_by(Diet) %>%
tally()
cw_summary
# A tibble: 4 x 2
Diet n
<fctr> <int>
1 1 220
2 2 120
3 3 120
4 4 118
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
facet_grid(~Diet) +
geom_text(data = cw_summary,
aes(Diet, Inf, label = n), vjust = 1)
The other method is to use the summary functions built in, but that can be fiddly. Here's an example:
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
stat_summary(fun.y = median, fun.ymax = length,
geom = "text", aes(label = ..ymax..), vjust = -1) +
facet_grid(~Diet)
Here I used fun.y to position the summary at the median of the y values, and used fun.ymax to compute an internal variable called ..ymax.. with the function length (which just counts the number of observations).

Related

Expand margins of ggplot

Apologies for the simplistic question, but I'm having trouble adjusting the size (width) of this plot to include all the data so that it doesn't look so squished. I've tried adjusting the margins and the width in png(), but nothing seems to work.
png("file_name.png", units = "in", width = 10, height = 5, res = 300)
ggplot(pred, aes(x = Longitude, y = Latitude)) +
geom_raster(aes(fill = Fitted)) +
facet_wrap(~ CYR) +
scale_fill_viridis(option = 'plasma',
na.value = 'transparent') +
coord_quickmap() +
theme(legend.position = 'top')
# theme(plot.margin=grid::unit(c(0,20,0,20), "mm"))
dev.off()
Do you need to use coord_quickmap() for some reason? Removing it 'fixes' the plot dimensions, e.g. using the palmerpenguins dataset:
library(ggplot2)
library(palmerpenguins)
p1 <- ggplot(penguins, aes(x = sex,
y = bill_length_mm,
fill = bill_depth_mm)) +
geom_raster() +
scale_fill_viridis_c(option = 'plasma',
na.value = 'transparent') +
facet_wrap(~interaction(island, species, year)) +
theme(legend.position = 'top') +
coord_quickmap()
p1
#> Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.
#> Warning: Removed 2 rows containing missing values (`geom_raster()`).
p2 <- ggplot(penguins, aes(x = sex,
y = bill_length_mm,
fill = bill_depth_mm)) +
geom_raster() +
scale_fill_viridis_c(option = 'plasma',
na.value = 'transparent') +
facet_wrap(~interaction(island, species, year)) +
theme(legend.position = 'top') #+
# coord_quickmap()
p2
#> Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
#> ℹ Consider using `geom_tile()` instead.
#> Removed 2 rows containing missing values (`geom_raster()`).
Created on 2023-02-13 with reprex v2.0.2

Not seeing count labels on bar chart in ggplot2

I'd like to add count labels to a simple bar chart. I've tried the following code with and without y=count; it runs without error but doesn't display the labels:
Class <- ggplot(asteroid, aes(x=class, y=count, fill = class, labels = TRUE))
Class +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '(*) indicates a potentially hazardous object') +
scale_fill_discrete(labels=c('Amor*','Apollo','Apollo*','Aten','Aten*','Interior Earth Object*'))
I have also tried this:
Class <- ggplot(asteroid, aes(x=class, y=count, fill = class))
Class +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '(*) indicates a potentially hazardous object') +
scale_fill_discrete(labels=c('Amor*','Apollo','Apollo*','Aten','Aten*','Interior Earth Object*'))+
geom_bar(stat = "identity") +
geom_text(aes(label = count), vjust = 0)
but I'm met with the following error:
"Error in `f()`:
! Aesthetics must be valid data columns. Problematic aesthetic(s): y = count.
Did you mistype the name of a data column or forget to add after_stat()?"
Obviously R is thinking "count" to be count() but I can't seem to find a way to work around this. Have tried using freq instead, Count, etc. Those were long shots but I'm a bit stumped.
Edit: Here's a snapshot of the character variable I'm working with:
> head(dput(asteroid$class))
> [1] "APO*" "APO*" "APO*" "APO*" "APO*" "APO*"
And the dataset:
head(dput(asteroid))
# A tibble: 6 × 17
Object...1 `Epoch (TDB)` `a (AU)` e `i (deg)` `w (deg)` `Node (deg)` `M (deg)` `q (AU)` `Q (AU)` `P (yr)`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1566 Icarus 57800 1.08 0.827 22.8 31.4 88.0 216. 0.187 1.97 1.12
2 1620 Geographos 57800 1.25 0.335 13.3 277. 337. 104. 0.828 1.66 1.39
3 1862 Apollo 55249 1.47 0.560 6.35 286. 35.7 175. 0.647 2.29 1.78
4 1981 Midas 57800 1.78 0.650 39.8 268. 357. 173. 0.621 2.93 2.37
5 2101 Adonis 57800 1.87 0.765 1.33 43.4 350. 235. 0.441 3.31 2.57
6 2102 Tantalus 57800 1.29 0.299 64.0 61.6 94.4 355. 0.904 1.68 1.47
# … with 6 more variables: `H (mag)` <dbl>, `MOID (AU)` <dbl>, ref <dbl>, class <chr>, Object...16 <chr>,
# Hazardous <dbl>
Updated with your data sample.
asteroid is uncounted, so I've added count(class). I've added a second version where the data is uncounted.
library(tidyverse)
asteroid <- data.frame(class = c("apo",'ieo','amor','aten', 'amor'))
# Counted
asteroid |>
count(class) |>
ggplot(aes(class, n, fill = class)) +
geom_col() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '() indicates a potentially hazardous object') +
geom_text(aes(label = n), vjust = 0)
# Uncounted
asteroid |>
ggplot(aes(class, fill = class)) +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '() indicates a potentially hazardous object') +
geom_text(aes(label = ..count..), stat = 'count', vjust = 0)
Created on 2022-06-06 by the reprex package (v2.0.1)

How can plot my own data in a grid in a map sf but return vacum

I am trying to summarize some statistics in the grid that I made, however something fails when I try to do it.
This is my data
head(catk)
Simple feature collection with 6 features and 40 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 303.22 ymin: -61.43 xmax: 303.95 ymax: -60.78
Geodetic CRS: WGS 84
# A tibble: 6 × 41
X1 day month year c1_id greenweight_caught_kg obs_haul_id
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 4 12 1997 26529 7260 NA
2 2 4 12 1997 26530 7920 NA
3 3 4 12 1997 26531 4692 NA
4 4 4 12 1997 26532 5896 NA
5 5 4 12 1997 26533 88 NA
6 6 5 12 1997 26534 7040 NA
# … with 34 more variables: obs_logbook_id <lgl>, obs_haul_number <lgl>,
# haul_number <dbl>, vessel_name <chr>, vessel_nationality_code <chr>,
# fishing_purpose_code <chr>, season_ccamlr <dbl>,
# target_species <chr>, asd_code <dbl>, trawl_technique <lgl>,
# catchperiod_code <chr>, date_catchperiod_start <date>,
# datetime_set_start <dttm>, datetime_set_end <dttm>,
# datetime_haul_start <dttm>, datetime_haul_end <dttm>, …
and I did this raster
an <- getData("GADM", country = "ATA", level = 0)
an#data$NAME_0
e <- extent(-70,-40,-68,-60)
rc <- crop(an, e)
proj4string(rc) <- CRS("+init=epsg:4326")
rc3 <- st_as_sf(rc)
catk <- st_as_sf(catk, coords = c("Longitude", "Latitude"), crs = 4326) %>%
st_shift_longitude()
Grid <- rc3 %>%
st_make_grid(cellsize = c(1,0.4)) %>% # para que quede cuadrada
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(cellid = row_number())
result <- Grid %>%
st_join(catk) %>%
group_by(cellid) %>%
summarise(sum_cat = sum(Catcht))
but I can not represent the data in the grid
ggplot() +
geom_sf(data = Grid, color="#d9d9d9", fill=NA) +
geom_sf(data = rc3) +
theme_bw() +
coord_sf() +
scale_alpha(guide="none")+
xlab(expression(paste(Longitude^o,~'O'))) +
ylab(expression(paste(Latitude^o,~'S')))+
guides( colour = guide_legend()) +
theme(panel.background = element_rect(fill = "#f7fbff"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(legend.position = "right")+
xlim(-69,-45)
fail plot
Please help me to find this solution!!
So I just saw that you shifted the coordinates with st_shift_longitude() and therefore your bounding box is:
Bounding box: xmin: 303.22 ymin: -61.43 xmax: 303.95 ymax: -60.78
Do you really need it? That doesn't match with your defined extent
e <- extent(-70,-40,-68,-60)
And a bbox for WGS84 is suppose to be at max c(-180, -90, 180, 90).
Also, on your plot you are not instructed ggplot2 to do anything with the values of catk. Grid and rc3 do not have anything from catk, is the result object.
Anyway, I give a try to your problem even though I don't have access to your dataset. I represent on each cell sum_cat from result
library(raster)
library(sf)
library(dplyr)
library(ggplot2)
# Mock your data
catk <- structure(list(Longitude = c(-59.0860203764828, -50.1352159580643,
-53.7671292009259, -67.9105254106185, -67.5753491797446, -51.7045571975837,
-45.6203830411619, -61.2695183762776, -51.6287384188695, -52.244074640978,
-45.4625770258213, -51.0935832496694, -45.6375681312716, -44.744215508174,
-66.3625310507564), Latitude = c(-62.0038884948778, -65.307178606448,
-65.8980199769778, -60.4475595973147, -67.7543165093134, -60.4616894158005,
-67.9720957068844, -62.2184680275876, -66.2345680342004, -64.1523442367459,
-62.5435163857161, -65.9127866479611, -66.8874734854608, -62.0859917484373,
-66.8762861503705), Catcht = c(18L, 95L, 32L, 40L, 16L, 49L,
22L, 14L, 86L, 45L, 3L, 51L, 45L, 41L, 19L)), row.names = c(NA,
-15L), class = "data.frame")
# Start the analysis
an <- getData("GADM", country = "ATA", level = 0)
e <- extent(-70,-40,-68,-60)
rc <- crop(an, e)
proj4string(rc) <- CRS("+init=epsg:4326")
rc3 <- st_as_sf(rc)
# Don't think you need st_shift_longitude, removed
catk <- st_as_sf(catk, coords = c("Longitude", "Latitude"), crs = 4326)
Grid <- rc3 %>%
st_make_grid(cellsize = c(1,0.4)) %>% # para que quede cuadrada
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(cellid = row_number())
result <- Grid %>%
st_join(catk) %>%
group_by(cellid) %>%
summarise(sum_cat = sum(Catcht))
ggplot() +
geom_sf(data = Grid, color="#d9d9d9", fill=NA) +
# Add here results with my mock data by grid
geom_sf(data = result %>% filter(!is.na(sum_cat)), aes(fill=sum_cat)) +
geom_sf(data = rc3) +
theme_bw() +
coord_sf() +
scale_alpha(guide="none")+
xlab(expression(paste(Longitude^o,~'O'))) +
ylab(expression(paste(Latitude^o,~'S')))+
guides( colour = guide_legend()) +
theme(panel.background = element_rect(fill = "#f7fbff"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(legend.position = "right")+
xlim(-69,-45)
Created on 2022-03-23 by the reprex package (v2.0.1)

How to solve simple linear programming problem with lpSolve

I am trying to maximize the function $a_1x_1 + \cdots +a_nx_n$ subject to the constraints $b_1x_1 + \cdots + b_nx_n \leq c$ and $x_i \geq 0$ for all $i$. For the toy example below, I've chosen $a_i = b_i$, so the problem is to maximize $0x_1 + 25x_2 + 50x_3 + 75x_4 + 100x_5$ given $0x_1 + 25x_2 + 50x_3 + 75x_4 + 100x_5 \leq 100$. Trivially, the maximum value of the objective function should be 100, but when I run the code below I get a solution of 2.5e+31. What's going on?
library(lpSolve)
a <- seq.int(0, 100, 25)
b <- seq.int(0, 100, 25)
c <- 100
optimal_val <- lp(direction = "max",
objective.in = a,
const.mat = b,
const.dir = "<=",
const.rhs = c,
all.int = TRUE)
optimal_val
b is not a proper matrix. You should do, before the lp call:
b <- seq.int(0, 100, 25)
b <- matrix(b,nrow=1)
That will give you an explicit 1 x 5 matrix:
> b
[,1] [,2] [,3] [,4] [,5]
[1,] 0 25 50 75 100
Now you will see:
> optimal_val
Success: the objective function is 100
Background: by default R will consider a vector as a column matrix:
> matrix(c(1,2,3))
[,1]
[1,] 1
[2,] 2
[3,] 3

Tensorflow - Same Padding Calculation

I have the following parameters:
in_height = 28
in_width = 28
stride (s) = 2
padding (p) = 'SAME'
The idea of 'SAME' padding is when s = 1 then input map and output map dimensions (height, width) should remain same
So if I should be able to get the padding size using the following:
(28 + 2*p - 5) + 1 = 28
Solving gives p = 2
which means there should be a padding on each side of 2
Using p=2 the output map size would be:
(28 + 4 -5)/2 + 1 = 14
From Tensorflow documentation, Same Padding:
out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))
pad_along_height = max((out_height - 1) * strides[1] +
filter_height - in_height, 0)
pad_along_width = max((out_width - 1) * strides[2] +
filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
To follow the above:
out_height = ceil(28.0/2.0) = 14.0
out_width = ceil(28.0/2.0) = 14.0
Hence
pad_along_height = max((14.0 -1)*2 + 5 - 28,0) = 3
pad_along_width = max((14.0 -1)*2 + 5 - 28,0) = 3
pad_top = 3 // 2 = 1
pad_bottom = 3//2 - pad_top = 2
pad_left = pad_along_width // 2 = 1
pad_right = pad_along_width - pad_left = 2
So does it mean that the image should be padded 1 on top and 2 on bottom similarly on the left and right?
I was looking at the Tensorflow documentation they actually validate the thought:
Note that the division by 2 means that there might be cases when the
padding on both sides (top vs bottom, right vs left) are off by one.
In this case, the bottom and right sides always get the one additional
padded pixel. For example, when pad_along_height is 5, we pad 2 pixels
at the top and 3 pixels at the bottom. Note that this is different
from existing libraries such as cuDNN and Caffe, which explicitly
specify the number of padded pixels and always pad the same number of
pixels on both sides.