I have an xarray dataset with three dimensions include lat, lon and time. Time dimension is monthly value for 12 values from 1 to 12. I want to plot a variable of this dataset with name of months (e.g. 'Jan', 'Feb', 'Mar',...).
How can I change number of months to name of months in plotting?
<xarray.Dataset>
Dimensions: (month: 12, latitude: 501, longitude: 721)
Coordinates:
* longitude (longitude) float64 49.8 49.81 49.82 ... 56.99 57.0
* latitude (latitude) float64 27.0 27.01 27.02 ... 31.99 32.0
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
Sum_monthly_Rain_mm (month, latitude, longitude) float32 dask.array<chunksize=(1, 501, 721), meta=np.ndarray>
Tair_C (month, latitude, longitude) float32 dask.array<chunksize=(1, 501, 721), meta=np.ndarray>
Plots:
temp_rain_mean_months.Tair_C.plot(x='longitude', y='latitude', col='month', col_wrap=4,
levels=[-10, -5, 0, 5, 10, 15, 20, 25, 30, 35, 40]);
two ways...
You can iterate through the axes on the plot object returned by da.plot and set the title manually:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
p = da.groupby('time.month').mean(dim='time').plot(col='month', col_wrap=4)
for i, ax in enumerate(p.axes.flat):
current_title = ax.get_title()
assert current_title[:len('month = ')] == 'month = '
month_ind = int(current_title[len('month = '):]) - 1
ax.set_title(months[month_ind])
Or, you could modify the dim on the array prior to plotting:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
da['month_name'] = ('month', months)
da.swap_dims({'month': 'month_name'}).Tair_C.plot(
x='longitude',
y='latitude',
col='month_name',
col_wrap=4,
levels=[-10, -5, 0, 5, 10, 15, 20, 25, 30, 35, 40],
)
prophet users of the world, hope all is well. I'm having some difficulties with a particular use case that I'll try to illustrate using some sample data and code below. First let's generate some sample data so that it will be a little bit easier to know what I am talking about.
library(data.table)
library(prophet)
library(dplyr)
# one year of months to be used for generating predictions
ds = c('2016-01-01', '2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-01','2016-07-01','2016-08-01','2016-09-01','2016-10-01','2016-11-01','2016-12-01' )
# historical customer counts
y = c (78498,12356,93732,5556,410,10296,9779,744,16407,100484,23954,141398,10575,850,16334,17496,1643,28074,93181,
18770,129968,11590,850,16738,17510,1376,27931,94369,18444,134850,13386,919,19075,18050,1565,31296,112094,27995,
167094,13402,1422,22766,20072,2340,37863,87346,16180,119863,7691,725,16931,12163,1241,25872,87455,16322,116390,
6994,620,13524,11059,990,22188,105473,23652,154145,13520,1008,18857,19209,1632,31105,102252,21284,138779,11670,
918,16078,16679,1257,26755,115033,22415,139835,13965,936,18027,18642,1407,28622,155371,40556,174321,25119,1859,
35326,28844,2962,51582,108817,19158,109864,8693,756,14358,13390,1091,21419)
# the segment channels of the customers
segment_channel = c('Existing_Omni', 'Existing_Retail', 'Existing_Direct', 'NTB_Omni', 'NTB_Retail', 'NTB_Direct', 'React_Omni', 'React_Retail', 'React_Direct')
# an external regressor to be added to the model (in my data there are like 40 of these regressor variables that I would like too add)
flash_sale = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3)
fake_data = merge(ds,segment_channel, all.y=TRUE)
setnames(fake_data, 'x', 'ds')
setnames(fake_data, 'y', 'segment_channel')
nrow(fake_data) # should be 108 rows, the 9 customer segements for each of the months in 2016
# next join the known customer counts, let's say we have them for the first 8 months of the year
fake_data = cbind(fake_data, y)
fake_data = cbind(fake_data, flash_sale)
# set some of the y values to NA so we can pretend we are trying to predict them using the ds time series as well as the flash sale values,
# which will be known in advance
fake_data = as.data.table(fake_data)
fake_data$ds = as.Date(fake_data$ds)
fake_data[, y := ifelse(ds >= '2016-08-01', NA, y)]
This code will generate a data set fairly similar to what I am working with for my problem, so hopefully you may be able to reproduce what I am doing. There are essentially two things I would like to be able to do with this data. The first is fairly straight forward, I want to be able to obviously add a regressor (like flash_sale in this example to the prophet model that I create. I can do this fairly easily like so:
christ <- tibble(
holiday = 'christ',
ds = as.Date(c('2016-11-01', '2017-11-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
nye <- tibble(
holiday = 'nye',
ds = as.Date(c('2016-11-01', '2017-12-01', '2018-11-01',
'2019-11-01')),
lower_window = 0,
upper_window = 1
)
holidays <- bind_rows(nye, christ)
m <- prophet(holidays = holidays)
m<- add_regressor(m, name = "flash_sale")
m <- fit.prophet(m, fake_data)
forecast <- predict(m, fake_data)
prophet_plot_components(m, forecast)
This should generate a fairly ugly plot but it's pretty easy to see that given the data this should be able to do the trick, and I could add multiple lines to add additional regressors. Ok, so we're all good so far. But the other issue is that I have 9 segment channels that I'm dealing with, and I don't want to build a separate model for each of them. Luckily I found a pretty good link on stack overflow that accomplishes the grouped prophet prediction: Using Prophet Package to Predict By Group in Dataframe in R
fcst = fake_data %>%
group_by(segment_channel) %>%
do(predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034), make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
dplyr::select(ds, segment_channel, yhat)
fcst
> fcst
# A tibble: 207 x 3
# Groups: segment_channel [9]
ds segment_channel yhat
<dttm> <fct> <dbl>
1 2016-01-01 00:00:00 Existing_Direct 38712.
2 2016-02-01 00:00:00 Existing_Direct 40321.
3 2016-03-01 00:00:00 Existing_Direct 42648.
4 2016-04-01 00:00:00 Existing_Direct 45130.
5 2016-05-01 00:00:00 Existing_Direct 46580.
6 2016-06-01 00:00:00 Existing_Direct 49437.
7 2016-07-01 00:00:00 Existing_Direct 50651.
8 2016-08-01 00:00:00 Existing_Direct 52685.
9 2016-09-01 00:00:00 Existing_Direct 54719.
10 2016-10-01 00:00:00 Existing_Direct 56687.
# ... with 197 more rows
This is more or less exactly what I want! Cool. So now all I have to do is figure out how to get my grouped predictions and my regressors added all in one step. I know I can have multi-line statements inside of do, so this is what I tried in order to get this to work:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 4
Error in add_regressor(prophet(., holidays = holidays), name = "flash_sale") :
Regressors must be added prior to model fitting.
Darn. Looks like it was running but then something about how I tried to add the regressor wasn't kosher. Next I it tried this way:
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ prophet(holidays = holidays),
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 4
Call `rlang::last_error()` to see a backtrace
> fcst = fake_data %>%
+ group_by(segment_channel) %>%
+ do(
+ add_regressor(prophet(., holidays = holidays), name = 'flash_sale'),
+ fit.prophet(prophet(. , holidays = holidays)),
+ predict(prophet(., seasonality.mode = 'multiplicative', holidays = holidays, seasonality.prior.scale = 10, changepoint.prior.scale = .034),
+ make_future_dataframe(prophet(.), periods = 11, freq='month'))) %>%
+ dplyr::select(ds, segment_channel, yhat)
Error: Can only supply one unnamed argument, not 3
Call `rlang::last_error()` to see a backtrace
I'm super confused at this point so I'm just hoping something out on the interwebs might know just the right incantation I need to get where I'm going.
I'm trying to use a state-space model to estimate population demographics (fecundity, survivorship, population growth, population size). We have 4 different age states.
# J0 = number of individuals 0-1
# surv1 = survivorship from 0-1
# J1 = number of individuals 0-1
# surv2 = survivorship from 1-2
# J2= = number of individuals 0-1
# surv3 = survivorship from 2-3
# J3= number of individuals 0-1
# survad = survivorship >3 "adult")
# Data as vectors (Talek clan from 1988-2013)
# X0 = individuals 0-1 in years
# X1 = individuals 1-2 in years
# X2 = individuals 2-3 in years
# X3 = individuals 3+ in years
# Total = group size
X0 <- c(7, 9, 4, 8, 9, 5, 8, 5, 7, 5, 5, 8, 10, 3, 5, 7, 2, 6, 6, 11, 14, 12, 15, 9, 10)
X1 <- c( 4, 4, 3, 4, 8, 5, 2, 4, 3, 4, 4, 5, 3, 7, 0, 5, 6, 3, 3, 5, 10, 12, 10, 13, 8)
X2 <- c(3, 2, 3, 3, 3, 8, 4, 1, 1, 2, 2, 4, 2, 2, 5, 0, 5, 5, 4, 3, 3, 10, 12, 7, 10)
X3 <- c(18, 16, 13, 16, 29, 29, 26, 22, 21, 18, 16, 15, 16, 15, 11, 14, 9, 12, 16, 18, 21, 23, 33, 32, 31)
Total <- c(32, 31, 23, 31, 49, 47, 40, 32, 32, 29, 27, 32, 31, 27, 21, 26, 22, 26, 29, 37, 48, 57, 70, 61, 59)
Here's the BUGS code:
sink(file = "HyenaIPM_all.txt")
cat("
model {
# Specify the priors for all parameters in the model
N.est[1] ~ dnorm(50, tau.proc)T(0,) # Initial abundance
mean.lambda ~ dunif(0, 5)
sigma.proc ~ dunif(0, 50)
tau.proc <- pow(sigma.proc, -2)
for (t in 1:TT) {
fec[t] ~ dunif(0, 5) # per capita fecundidty
surv1[t] ~ dunif(0, 1) # survivorship from 0-1
surv2[t] ~ dunif(0, 1) # survivorship from 1-2
surv3[t] ~ dunif(0, 1) # survivorship from 2-3
survad[t] ~ dunif(0, 1) # adult survivorship
}
# Estimate fecundity and survivorship
for (t in 2:TT) {
# Fecundity
J0[t+1] ~ dpois(survad[t]*fec[t])
J0[t+1] <- J3[t] * fec[t]
# Survivorship
J1[t+1] ~ dbin(surv1[t], J0[t])
J1[t+1] <- J0[t]*surv1[t]
J2[t+1] ~ dbin(surv2[t], J1[t])
J2[t+1] <- J1[t]*surv2[t]
J3[t+1] ~ dbin(surv3[t], J2[t-1])
J3[t+1] <- J2[t]*surv3[t] + J3[t]*survad[t]
A[t+1] ~ dbin(survad[t], A[t])
A[t+1] <- J3[t]*surv3[t] + A[t]*survad[t]
# Lambda
lambda[t+1] ~ dnorm(mean.lambda, tau.proc)
N.est[t+1] <- N.est[t]*lambda[t]
}
# Population size
for (t in 1:TT){
N[t] ~ dpois(N.est[t])
}
}
", fill = T)
sink()
# Parameters monitored
sp.params <- c("fec", "surv1", "surv2", "surv3", "survad", "lambda")
# MCMC settings
ni <- 200
nt <- 10
nb <- 100
nc <- 3
# Initial values
sp.inits <- function()list(mean.lambda = runif(1, 0, 1))
#Load all the data
sp.data <- list(N = Total, TT = length(Total), J0 = X0, J1 = X1, J2 = X2, J3 = X3)
library(R2jags)
hyena_model <- jags(sp.data, sp.inits, sp.params, "HyenaIPM_all.txt", n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb)
Unfortunately, I get the following error when I run the code.
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Index out of range for node J0
Does anyone have any suggestions for why we get this error? Not sure why the distribution would be wrong for J0.
This is a very informative error message. The index for J0 is t+1 which ranges from 2+1 to TT+1, but J0 has length TT. So when the index is TT+1 it is out of range since it is larger than TT.