Using subplot with seaborn - matplotlib

I use the following code to visualize different Seaborn plot in one windows:
fig, axs = plt.subplots(2,2, figsize=(6,6))
x1 = sns.jointplot(x="MO2: Vit. vent 1 10min [m/s] ", y="MO2: Puissance active 10min [kW] ", data=df_MO1_MO2.loc[df_MO1_MO2["Year"] == 2011], kind='kde', ax=axs[0][0])
x2 = sns.jointplot(x="MO2: Vit. vent 1 10min [m/s] ", y="MO2: Puissance active 10min [kW] ", data=df_MO1_MO2.loc[df_MO1_MO2["Year"] == 2012], kind='kde', ax=axs[0][1])
x2 = sns.jointplot(x="MO2: Vit. vent 1 10min [m/s] ", y="MO2: Puissance active 10min [kW] ", data=df_MO1_MO2.loc[df_MO1_MO2["Year"] == 2016], kind='kde', ax=axs[1][0])
x3 = sns.jointplot(x="MO2: Vit. vent 1 10min [m/s] ", y="MO2: Puissance active 10min [kW] ", data=df_MO1_MO2.loc[df_MO1_MO2["Year"] == 2017], kind='kde', ax=axs[1][1])
fig.suptitle('Active power evolution', position=(.5,1.1), fontsize=20)
fig.tight_layout()
This code return correctly a 2 by 2 subplot but it shows also the four plots in 4 different lines. Can you help me to find the error in my code.

Related

Dirichlet regressioni coefficients

starting with this example of Dirichlet regression here.
My variable y is a vector of N = 3 elements and the Dirichlet regression model estimates N-1 coeff.
Let’s say I am interested in all 3 coefficients, how can I get them?
Thanks!
library(brms)
library(rstan)
library(dplyr)
bind <- function(...) cbind(...)
N <- 20
df <- data.frame(
y1 = rbinom(N, 10, 0.5), y2 = rbinom(N, 10, 0.7),
y3 = rbinom(N, 10, 0.9), x = rnorm(N)
) %>%
mutate(
size = y1 + y2 + y3,
y1 = y1 / size,
y2 = y2 / size,
y3 = y3 / size
)
df$y <- with(df, cbind(y1, y2, y3))
make_stancode(bind(y1, y2, y3) ~ x, df, dirichlet())
make_standata(bind(y1, y2, y3) ~ x, df, dirichlet())
fit <- brm(bind(y1, y2, y3) ~ x, df, dirichlet())
summary(fit)
Family: dirichlet
Links: muy2 = logit; muy3 = logit; phi = identity
Formula: bind(y1, y2, y3) ~ x
Data: df (Number of observations: 20)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muy2_Intercept 0.29 0.10 0.10 0.47 1.00 2830 2514
muy3_Intercept 0.56 0.09 0.38 0.73 1.00 2833 2623
muy2_x 0.04 0.11 -0.17 0.24 1.00 3265 2890
muy3_x -0.00 0.10 -0.20 0.19 1.00 3229 2973
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 39.85 9.13 23.83 59.78 1.00 3358 2652
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Not seeing count labels on bar chart in ggplot2

I'd like to add count labels to a simple bar chart. I've tried the following code with and without y=count; it runs without error but doesn't display the labels:
Class <- ggplot(asteroid, aes(x=class, y=count, fill = class, labels = TRUE))
Class +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '(*) indicates a potentially hazardous object') +
scale_fill_discrete(labels=c('Amor*','Apollo','Apollo*','Aten','Aten*','Interior Earth Object*'))
I have also tried this:
Class <- ggplot(asteroid, aes(x=class, y=count, fill = class))
Class +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '(*) indicates a potentially hazardous object') +
scale_fill_discrete(labels=c('Amor*','Apollo','Apollo*','Aten','Aten*','Interior Earth Object*'))+
geom_bar(stat = "identity") +
geom_text(aes(label = count), vjust = 0)
but I'm met with the following error:
"Error in `f()`:
! Aesthetics must be valid data columns. Problematic aesthetic(s): y = count.
Did you mistype the name of a data column or forget to add after_stat()?"
Obviously R is thinking "count" to be count() but I can't seem to find a way to work around this. Have tried using freq instead, Count, etc. Those were long shots but I'm a bit stumped.
Edit: Here's a snapshot of the character variable I'm working with:
> head(dput(asteroid$class))
> [1] "APO*" "APO*" "APO*" "APO*" "APO*" "APO*"
And the dataset:
head(dput(asteroid))
# A tibble: 6 × 17
Object...1 `Epoch (TDB)` `a (AU)` e `i (deg)` `w (deg)` `Node (deg)` `M (deg)` `q (AU)` `Q (AU)` `P (yr)`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1566 Icarus 57800 1.08 0.827 22.8 31.4 88.0 216. 0.187 1.97 1.12
2 1620 Geographos 57800 1.25 0.335 13.3 277. 337. 104. 0.828 1.66 1.39
3 1862 Apollo 55249 1.47 0.560 6.35 286. 35.7 175. 0.647 2.29 1.78
4 1981 Midas 57800 1.78 0.650 39.8 268. 357. 173. 0.621 2.93 2.37
5 2101 Adonis 57800 1.87 0.765 1.33 43.4 350. 235. 0.441 3.31 2.57
6 2102 Tantalus 57800 1.29 0.299 64.0 61.6 94.4 355. 0.904 1.68 1.47
# … with 6 more variables: `H (mag)` <dbl>, `MOID (AU)` <dbl>, ref <dbl>, class <chr>, Object...16 <chr>,
# Hazardous <dbl>
Updated with your data sample.
asteroid is uncounted, so I've added count(class). I've added a second version where the data is uncounted.
library(tidyverse)
asteroid <- data.frame(class = c("apo",'ieo','amor','aten', 'amor'))
# Counted
asteroid |>
count(class) |>
ggplot(aes(class, n, fill = class)) +
geom_col() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '() indicates a potentially hazardous object') +
geom_text(aes(label = n), vjust = 0)
# Uncounted
asteroid |>
ggplot(aes(class, fill = class)) +
geom_bar() +
ggtitle('Frequency of Hazardous and Non-Hazardous Asteroids by Class') +
xlab('Class Based on Solar System Location') +
labs(caption = '() indicates a potentially hazardous object') +
geom_text(aes(label = ..count..), stat = 'count', vjust = 0)
Created on 2022-06-06 by the reprex package (v2.0.1)

How can plot my own data in a grid in a map sf but return vacum

I am trying to summarize some statistics in the grid that I made, however something fails when I try to do it.
This is my data
head(catk)
Simple feature collection with 6 features and 40 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 303.22 ymin: -61.43 xmax: 303.95 ymax: -60.78
Geodetic CRS: WGS 84
# A tibble: 6 × 41
X1 day month year c1_id greenweight_caught_kg obs_haul_id
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 4 12 1997 26529 7260 NA
2 2 4 12 1997 26530 7920 NA
3 3 4 12 1997 26531 4692 NA
4 4 4 12 1997 26532 5896 NA
5 5 4 12 1997 26533 88 NA
6 6 5 12 1997 26534 7040 NA
# … with 34 more variables: obs_logbook_id <lgl>, obs_haul_number <lgl>,
# haul_number <dbl>, vessel_name <chr>, vessel_nationality_code <chr>,
# fishing_purpose_code <chr>, season_ccamlr <dbl>,
# target_species <chr>, asd_code <dbl>, trawl_technique <lgl>,
# catchperiod_code <chr>, date_catchperiod_start <date>,
# datetime_set_start <dttm>, datetime_set_end <dttm>,
# datetime_haul_start <dttm>, datetime_haul_end <dttm>, …
and I did this raster
an <- getData("GADM", country = "ATA", level = 0)
an#data$NAME_0
e <- extent(-70,-40,-68,-60)
rc <- crop(an, e)
proj4string(rc) <- CRS("+init=epsg:4326")
rc3 <- st_as_sf(rc)
catk <- st_as_sf(catk, coords = c("Longitude", "Latitude"), crs = 4326) %>%
st_shift_longitude()
Grid <- rc3 %>%
st_make_grid(cellsize = c(1,0.4)) %>% # para que quede cuadrada
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(cellid = row_number())
result <- Grid %>%
st_join(catk) %>%
group_by(cellid) %>%
summarise(sum_cat = sum(Catcht))
but I can not represent the data in the grid
ggplot() +
geom_sf(data = Grid, color="#d9d9d9", fill=NA) +
geom_sf(data = rc3) +
theme_bw() +
coord_sf() +
scale_alpha(guide="none")+
xlab(expression(paste(Longitude^o,~'O'))) +
ylab(expression(paste(Latitude^o,~'S')))+
guides( colour = guide_legend()) +
theme(panel.background = element_rect(fill = "#f7fbff"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(legend.position = "right")+
xlim(-69,-45)
fail plot
Please help me to find this solution!!
So I just saw that you shifted the coordinates with st_shift_longitude() and therefore your bounding box is:
Bounding box: xmin: 303.22 ymin: -61.43 xmax: 303.95 ymax: -60.78
Do you really need it? That doesn't match with your defined extent
e <- extent(-70,-40,-68,-60)
And a bbox for WGS84 is suppose to be at max c(-180, -90, 180, 90).
Also, on your plot you are not instructed ggplot2 to do anything with the values of catk. Grid and rc3 do not have anything from catk, is the result object.
Anyway, I give a try to your problem even though I don't have access to your dataset. I represent on each cell sum_cat from result
library(raster)
library(sf)
library(dplyr)
library(ggplot2)
# Mock your data
catk <- structure(list(Longitude = c(-59.0860203764828, -50.1352159580643,
-53.7671292009259, -67.9105254106185, -67.5753491797446, -51.7045571975837,
-45.6203830411619, -61.2695183762776, -51.6287384188695, -52.244074640978,
-45.4625770258213, -51.0935832496694, -45.6375681312716, -44.744215508174,
-66.3625310507564), Latitude = c(-62.0038884948778, -65.307178606448,
-65.8980199769778, -60.4475595973147, -67.7543165093134, -60.4616894158005,
-67.9720957068844, -62.2184680275876, -66.2345680342004, -64.1523442367459,
-62.5435163857161, -65.9127866479611, -66.8874734854608, -62.0859917484373,
-66.8762861503705), Catcht = c(18L, 95L, 32L, 40L, 16L, 49L,
22L, 14L, 86L, 45L, 3L, 51L, 45L, 41L, 19L)), row.names = c(NA,
-15L), class = "data.frame")
# Start the analysis
an <- getData("GADM", country = "ATA", level = 0)
e <- extent(-70,-40,-68,-60)
rc <- crop(an, e)
proj4string(rc) <- CRS("+init=epsg:4326")
rc3 <- st_as_sf(rc)
# Don't think you need st_shift_longitude, removed
catk <- st_as_sf(catk, coords = c("Longitude", "Latitude"), crs = 4326)
Grid <- rc3 %>%
st_make_grid(cellsize = c(1,0.4)) %>% # para que quede cuadrada
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(cellid = row_number())
result <- Grid %>%
st_join(catk) %>%
group_by(cellid) %>%
summarise(sum_cat = sum(Catcht))
ggplot() +
geom_sf(data = Grid, color="#d9d9d9", fill=NA) +
# Add here results with my mock data by grid
geom_sf(data = result %>% filter(!is.na(sum_cat)), aes(fill=sum_cat)) +
geom_sf(data = rc3) +
theme_bw() +
coord_sf() +
scale_alpha(guide="none")+
xlab(expression(paste(Longitude^o,~'O'))) +
ylab(expression(paste(Latitude^o,~'S')))+
guides( colour = guide_legend()) +
theme(panel.background = element_rect(fill = "#f7fbff"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(legend.position = "right")+
xlim(-69,-45)
Created on 2022-03-23 by the reprex package (v2.0.1)

Map of New York State counties with binned colors and legend

I am trying to make a county-level map of the state of New York. I would like to color each county based on their level of unionization. I need the map and legend to have four discrete colors of red, rather than a red gradient. I need the legend to display these four different colors with non-overlapping labels/ranges (e.g. 0-25; 26-50; 51-75; 76-100).
Here is my data:
fips unionized
1 36001 33.33333
2 36005 86.11111
3 36007 0.00000
4 36017 0.00000
5 36021 0.00000
6 36027 66.66667
7 36029 40.00000
8 36035 50.00000
9 36039 0.00000
10 36047 82.85714
11 36051 0.00000
12 36053 100.00000
13 36055 30.76923
14 36057 0.00000
15 36059 84.37500
16 36061 81.81818
17 36063 60.00000
18 36065 50.00000
19 36067 71.42857
20 36069 0.00000
21 36071 55.55556
22 36073 0.00000
23 36079 100.00000
24 36081 92.15686
25 36083 50.00000
26 36085 100.00000
27 36087 87.50000
28 36101 0.00000
29 36103 63.88889
30 36105 0.00000
31 36107 0.00000
32 36111 50.00000
33 36113 50.00000
34 36115 100.00000
35 36117 0.00000
36 36119 73.33333
37 36121 0.00000
38 36123 0.00000
I have successfully made the map with a gradient of colors, but cannot figure out how to make discrete colors in the map and legend.
Here is my code:
library(usmap)
library(ggplot2)
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "unionized") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_continuous(low = "white", high = "red", na.value="light grey", name = "Unionization") + theme(legend.position = "right")
Thanks!
This could be achieved via scale_fill_binned and guide_bins. Try this:
library(usmap)
library(ggplot2)
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "unionized") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_binned(low = "white", high = "red", na.value="light grey", name = "Unionization", guide = guide_bins(axis = FALSE, show.limits = TRUE)) +
theme(legend.position = "right")
A second option would be to bin the variable manually and use scale_fill_manual to set the fill colors which makes it easy to set the labels and has the advantage that it adds the NAs automatically. For the color scale I make use of colorRampPalette (By default colorRampPalette interpolates in rgb color space. To get fill colors like the one using scale_fill_binned you can add the argument space = "Lab".).
library(usmap)
library(ggplot2)
Z$union_bin <- cut_interval(Z$unionized, n = 4, labels = c("0-25", "26-50", "51-75", "76-100"))
plot_usmap(regions = "counties", include = c("NY"), data = Z, values = "union_bin") +
labs(title = "Percent Unionized", subtitle = "") +
scale_fill_manual(values = colorRampPalette(c("white", "red"))(5)[2:5],
na.value="light grey", name = "Unionization") +
theme(legend.position = "right")

Add sample size to a panel figure of boxplots

I am trying to add sample size to boxplots (preferably at the top or bottom of them) that are grouped by two levels. I used the facet_grid() function to produce a panel plot. I then tried to use the annotate() function to add the sample sizes, however this couldn't work because it repeated the values in the second panel. Is there a simple way to do this?
head(FeatherData, n=10)
Location Status FeatherD Species ID
## 1 TX Resident -27.41495 Carolina wren CARW (32)
## 2 TX Resident -29.17626 Carolina wren CARW (32)
## 3 TX Resident -31.08070 Carolina wren CARW (32)
## 4 TX Migrant -169.19579 Yellow-rumped warbler YRWA (28)
## 5 TX Migrant -170.42079 Yellow-rumped warbler YRWA (28)
## 6 TX Migrant -158.66925 Yellow-rumped warbler YRWA (28)
## 7 TX Migrant -165.55278 Yellow-rumped warbler YRWA (28)
## 8 TX Migrant -170.43374 Yellow-rumped warbler YRWA (28)
## 9 TX Migrant -170.21801 Yellow-rumped warbler YRWA (28)
## 10 TX Migrant -184.45871 Yellow-rumped warbler YRWA (28)
ggplot(FeatherData, aes(x = Location, y = FeatherD)) +
geom_boxplot(alpha = 0.7, fill='#A4A4A4') +
scale_y_continuous() +
scale_x_discrete(name = "Location") +
theme_bw() +
theme(plot.title = element_text(size = 20, family = "Times", face =
"bold"),
text = element_text(size = 20, family = "Times"),
axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 15)) +
ylab(expression(Feather~delta^2~H["f"]~"‰")) +
facet_grid(. ~ Status)
There's multiple ways to do this sort of task. The most flexible way is to compute your statistic outside the plotting call as a separate dataframe and use it as its own layer:
library(dplyr)
library(ggplot2)
cw_summary <- ChickWeight %>%
group_by(Diet) %>%
tally()
cw_summary
# A tibble: 4 x 2
Diet n
<fctr> <int>
1 1 220
2 2 120
3 3 120
4 4 118
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
facet_grid(~Diet) +
geom_text(data = cw_summary,
aes(Diet, Inf, label = n), vjust = 1)
The other method is to use the summary functions built in, but that can be fiddly. Here's an example:
ggplot(ChickWeight, aes(Diet, weight)) +
geom_boxplot() +
stat_summary(fun.y = median, fun.ymax = length,
geom = "text", aes(label = ..ymax..), vjust = -1) +
facet_grid(~Diet)
Here I used fun.y to position the summary at the median of the y values, and used fun.ymax to compute an internal variable called ..ymax.. with the function length (which just counts the number of observations).