Error in as.data.frame.default, class coercion 'structure("RasterStack", package = "raster")' in data.frame not possible - dataframe

`Hello everybody
I get an error when I try to use the predict function. I'm doing a habitat suitability study.
this function requires the model (in this case a glmm) in which I have used trimmed variables, and I want to make the prediction on the same variables but not trimmed.
the varaibles i use in the model are the home range of animals for different climatic o ambient variables, i want predict on a biggest area. all variables are in the same extent, crs and spatial resolution
m
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) \['glmerMod'\] Family: binomial ( logit ) Formula: as.factor(pres.abs) \~ bio.3.kernel + bio.7.kernel + bio.8.kernel + bio.9.kernel + bio.13.kernel + bio.15.kernel + prec.7.kernel + landcover.kernel + slope.k + (1 | id) Data: new AIC BIC logLik deviance df.resid 21622\.34 21707.70 -10800.17 21600.34 17315 Random effects: Groups Name Std.Dev. id (Intercept) 0.1911 Number of obs: 17326, groups: id, 9 Fixed Effects: (Intercept) bio.3.kernel bio.7.kernel bio.8.kernel bio.9.kernel bio.13.kernel \-0.002928 -0.019971 -0.034466 0.310813 -0.325558 -0.324485 bio.15.kernel prec.7.kernel landcover.kernel slope.k \-0.671262 0.399149 0.142602 0.429925
names(var2)
`[1] "bio.3.kernel" "bio.7.kernel" "bio.8.kernel" "bio.9.kernel" "bio.13.kernel"
[6] "bio.15.kernel" "prec.7.kernel" "landcover.kernel" "slope.k"
p<- predict(m, var2)
Error in as.data.frame.default(data, optional = TRUE) : class coercion 'structure("RasterBrick", package = "raster")' in data.frame not possible

Related

Gekko Variable Definition - Primary vrs. Utility Decision Variable

I am trying to formulate and solve an optimization problem based on an article. The authors introduced 2 decision variables. Power of station i at time t, P_i,t, and a binary variable X_i,n which is 1 if vehicle n is assigned to station i.
They introduced some other variables, called utility variables. For instance, energy delivered from station i up to time t for vehicle n, E_i,t,n which is calculated based on primary decision variables and a few fix parameters.
My question is should I define the utility variables as Gekko variables? If yes, which type is more appropriate?
I = 4 # number of stations
T = 24 # hours of simulation
N = 5 # number of vehicles
p = m.Array(m.Var,(I,T),lb=0,ub= params.ev.max_power)
x = m.Array(m.Var,(I,N),lb=0,ub=1, integer = True)
Should I define E as follow to solve these equations as an example? This introduces extra variables that are not primary decision variables and are calculated based on other terms that depend on the primary decision variable.
E = m.Array(m.Var,(I,T,N),lb=0)
for i in range(I):
for n in range(N):
for t in range(T):
m.Equation(E[i][t][n] >= np.sum(0.25 * availability[n, :t] * p[i,:t]) - (M * (1 - x[i][n])))
m.Equation(E[i][t][n] <= np.sum(0.25 * availability[n, :t] * p[i,:t]) + (M * (1 - x[i][n])))
m.Equation(E[i][t][n] <= M * x[i][n])
m.Equation(E[i][t][n] >= -M * x[i][n])
All of those variable definitions and equations look correct. Here are a few suggestions:
There is no availability[] variable defined yet. If availability is a function of other decision variables, then it is generally more efficient to use an m.Intermediate() definition to define it.
As the total number of total decision variables increase, there is often a large increase in computational time. I recommend starting with a small problem initially and then scale-up to the larger sized problem.
Try the gekko m.sum() instead of sum or np.sum() for potentially more efficient calculations. Using m.sum() does increase the model compile time but generally decreases the optimization solve time, so it is a trade-off.

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

What is meant by "unit" in IDEA code duplication analysis?

IntelliJ IDEA has an ability to find duplicated code.
One can tune the amount of "units" (according to their documentation) that is considered duplicate.
However, I can't find any explanation on what is this "unit".
I'm looking for an answer that unambiguously defines such units.
The "units" measure is used in option Do not show duplicates simpler than. This option defines the minimal weight of the reported code fragments.
This weight is computed as a sum of all element weights in the fragment.
And since different elements have the different weights sum of them must be measured in abstract "units".
Element weight can be roughly approximated as:
it's a statement -> 2
it's an expression/literal/identifier -> 1
otherwise -> 0
For example, weight of x = 42; can be approximated as w(x) + w(=) + w(42) + w(;) + w(statement(x=42;)). Which is rougly 1 + 0 + 1 + 2 = 4 .

Dummy Variables in Julia

In R there is nice functionality for running a regression with dummy variables for each level of a categorical variable. e.g. Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
Is there an equivalent way to do this in Julia.
x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]
data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
I21+I22+I23+I24, data)
If you are using the DataFrames package, after you pool the data, the package will take care of the rest:
Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.
You can see the rest of documentation on pooled data here

How to understand bias parameter in LIBLINEAR?

I don't understand the meaning of bias parameter in the API of LIBLINEAR. Why is it specified by user during the training? Shouldn't it be just a distance from the separating hyperplane to origin which is a parameter of the learned model?
This is from the README:
struct problem
{
int l, n;
int *y;
struct feature_node **x;
double bias;
};
If bias >= 0, we assume that one additional feature is added to the end of each data instance.
What is this additional feature?
Let's look at the equation for the separating hyperplane:
w_1 * x_1 + w_2 * x_2 + w_3 * x_3 + ... + w_bias * x_bias = 0
Where x are the feature values and w are the trained "weights". The additional feature x_bias is a constant, whose value is equal to the bias. If bias = 0, you will get a separating hyperplane going through the origin (0,0,0,...). You can imagine many cases, where such a hyperplane is not the optimal separator.
The value of the bias affects the margin through scaling of w_bias. Therefore the bias is a tuning parameter, which is usually determined through cross-validation similar to other parameters.