How to get an importance of a class using random forest? - variables

I am using randomForest package in my dataset to do a classification, but with the importance command I only get the importance of variables. So, if I want the variable importance by specific categories of variables? Like a specific location in a region variable, how much that region impact in the total. I thought in transformer every class in a dummy, but i don't know if this is really a good idea.

I think you mean "variable importance by specific categories of variables". That has not been implemented, but I guess it would be possible, meaningful and perhaps useful. Of course it would not be meaningful for variables with only two categories.
I would implement it something like:
Train model -> compute out-of-bag prediction performance (OOB-cv1) -> permute specific category by specific variable (reassign this category randomly to other categories, weighted by other category prevalence) -> re-compute out-of-bag- prediction performance (OOB-cv2) -> subtract OOB-cv1 from OOB-cv2
And then I wrote the a function implementing categorical specific variable importance.
library(randomForest)
#Create some classification problem, with mixed categorical and numeric vars
#Cat A of var 1, cat B of var 2 and Cat C of var 3 influence class the most.
X.cat = replicate(3,sample(c("A","B","C"),600,rep=T))
X.val = replicate(2,rnorm(600))
y.cat = 3*(X.cat[,1]=="A") + 3*(X.cat[,2]=="B") + 3*(X.cat[,3]=="C")
y.cat.err = y.cat+rnorm(600)
y.lim = quantile(y.cat.err,c(1/3,2/3))
y.class = apply(replicate(2,y.cat.err),1,function(x) sum(x>y.lim)+1)
y.class = factor(y.class,labels=c("ann","bob","chris"))
X.full = data.frame(X.cat,X.val)
X.full[1:3] = lapply(X.full[1:3],as.factor)
#train forest
rf=randomForest(X.full,y.class,keep.inbag=T,replace=T)
#make function to compute crovalidated classification error
oobErr = function(rf,X) {
preds = predict(rf,X,type="vote",predict.all = T)$individual
preds[rf$inbag!=0]=NA
oob.pred = apply(preds,1,function(x) {
tabx=sort(table(x),dec=T)
majority.vote = names(tabx)[1]
})
return(mean(as.character(rf$y)!=oob.pred))
}
#make function to iterate all categories of categorical variables
#and compute change of OOB class error due to permutation of category
catVar = function(rf,X,nPerm=2) {
ref = oobErr(rf,X)
catVars = which(rf$forest$ncat>1)
lapply(catVars, function(iVar) {
catImp = replicate(nPerm,{
sapply(levels(X[[iVar]]), function(thisCat) {
thisCat.ind = which(thisCat==X[[iVar]])
X[thisCat.ind,iVar] = head(sample(X[[iVar]]),length(thisCat.ind))
varImp = oobErr(rf,X)-ref
})
})
if(nPerm==1) catImp else apply(catImp,1,mean)
})
}
#try it out
out = catVar(rf,X.full,nPerm=4)
print(out) #seems like it works as it should
$X1
A B C
0.14000 0.07125 0.06875
$X2
A B C
0.07458333 0.16083333 0.07666667
$X3
A B C
0.05333333 0.08083333 0.15375000

Related

Is there a method for converting a winmids object to a mids object?

Suppose I create 10 multiply-imputed datasets and use the (wonderful) MatchThem package in R to create weights for my exposure variable. The MatchThem package takes a mids object and converts it to an object of the class winmids.
My desired output is a mids object - but with weights. I hope to pass this mids object to BRMS as follows:
library(brms)
m0 <- brm_multiple(Y|weights(weights) ~ A, data = mids_data)
Open to suggestions.
EDIT: Noah's solution below will unfortunately not work.
The package's first author, Farhad Pishgar, sent me the following elegant solution. It will create a mids object from a winmidsobject. Thank you Farhad!
library(mice)
library(MatchThem)
#"weighted.dataset" is our .wimids object
#Extracting the original dataset with missing value
maindataset <- complete(weighted.datasets, action = 0)
#Some spit-and-polish
maindataset <- data.frame(.imp = 0, .id = seq_len(nrow(maindataset)), maindataset)
#Extracting imputed-weighted datasets in the long format
alldataset <- complete(weighted.datasets, action = "long")
#Binding them together
alldataset <- rbind(maindataset, alldataset)
#Converting to .mids
newmids <- as.mids(alldataset)
Additionally, for BRMS, I worked out this solution which instead creates a list of dataframes. It will work in fewer steps.
library("mice")
library("dplyr")
library("MatchThem")
library("brms") # for bayesian estimation.
# Note, I realise that my approach here is not fully Bayesian, but that is a good thing! I need to ensure balance in the exposure.
# impute missing data
data("nhanes2")
imp <- mice(nhanes2, printFlag = FALSE, seed = 0, m = 10)
# MathThem. This is just a fast method
w_imp <- weightthem(hyp ~ chl + age, data = imp,
approach = "within",
estimand = "ATE",
method = "ps")
# get individual data frames with weights
out <- complete(w_imp, action ="long", include = FALSE, mild = TRUE)
# assemble individual data frames into a list
m <- 10
listdat<- list()
for (i in 1:m) {
listdat[[i]] <- as.data.frame(out[[i]])
}
# pass the list to brms, and it runs as it should!
fit_1 <- brm_multiple(bmi|weights(weights) ~ age + hyp + chl,
data = listdat,
backend = "cmdstanr",
family = "gaussian",
set_prior('normal(0, 1)',
class = 'b'))
brm_multiple() can take in a list of data frames for its data argument. You can produce this from the wimids object using complete(). The output of complete() with action = "all" is a mild object, which is a list of data frames, but this is not recognized by brm_multiple() as such. So, you can just convert it to a list. This should look like the following:
df_list <- complete(mids_data, "all")
class(df_list) <- "list"
m0 <- brm_multiple(Y|weights(weights) ~ A, data = df_list)
Using complete() automatically adds a weights column to the resulting imputed data frames.

In pyscipopt, would it be possible to use a function containing an optimization model inside of my main optimization model?

I am using Jupyter Notebook. I have tried defining a function with an optimization model, it seems to work outside of my main model. When I tried using the function on a variable inside my main model, at first the kernel dies, when I have updated Anaconda, it now seems to be doing nothing.
My function:
def optfunc(x):
mod = Model()
y = mod.addVar("y", ub = 2, lb = -1)
consl = mod.addCons(y + x <= 3, "cons")
mod.setObjective(y, "maximize")
mod.optimize()
sol = mod.getBestSol()
return mod.getSolVal(sol, y)
My main model:
mainfunc = Model()
n = mainfunc.addVar("n",lb=1,ub=3)
c = optfunc(n)
const = mainfunc.addCons(n + 0.5 == 1, "cons")
mainfunc.setObjective(n, "maximize")
mainfunc.optimize()
sol = mainfunc.getBestSol()
print(mainfunc.getSolVal(sol,n))
This does not work. You cannot have a Model inside another Model - especially, assigning a variable from the main Model (x) to be also a variable in the sub-model.

How to change the mutable parameter in Pyomo (AbstractModel)?

I am trying to update my mutable parameter Nc in my Abstract model
the initial value is 4
I constructed the instance then change instance.Nc to 5 and solve it but it is still 4 (initial value) , can any body help ?
from pyomo.environ import *
import random
model = AbstractModel()
model.i = RangeSet(40)
model.j = Set(initialize=model.i)
model.x = Var(model.i,model.j, initialize=0,within=Binary)
model.y = Var(model.i, within=Binary)
model.Nc=Param(initialize=5,mutable=True)
def Ninit(model,i):
return random.randint(0,1)
model.N=Param(model.i,initialize=Ninit,mutable=True)
def Dinit(model,i,j):
return random.random()
model.D=Param(model.i,model.j,initialize=Dinit,mutable=True)
def rule_C1(model,i,j):
return model.x[i,j]<=model.N[i]*model.y[j]
model.C1 = Constraint(model.i,model.j,rule=rule_C1)
def rule_C2(model):
return sum(model.y[i] for i in model.i )==model.Nc
model.C2 = Constraint(rule=rule_C2)
def rule_C3(model,i):
return sum(model.x[i,j] for j in model.j)==model.N[i]
model.C3 = Constraint(model.i,rule=rule_C3)
def rule_OF(model):
return sum( model.x[i,j]*model.D[i,j] for i in model.i for j in model.j )
model.obj = Objective(rule=rule_OF, sense=minimize)
opt = SolverFactory('glpk')
#model.NC=4
instance = model.create_instance()
instance.NC=4
results = opt.solve(instance) # solves and updates instance
print('NC= ',value(instance.Nc))
print('OF= ',value(instance.obj))
It seems you are actually initializing your parmeter Nc to 5 (model.Nc=Param(initialize=5,mutable=True)) and then changing it to 4 once you create the instance (instance.Nc=4), so you might want to do the opposite (model.Nc=Param(initialize=4,mutable=True) then instance.Nc=4)
Also, note that you are inconsistantly addressing the Nc parameter throughout the code. When you declare the parameter you name it "Nc" (model.Nc=Param(initialize=5,mutable=True)), which is the actual python variable that Pyomo will use in the model, but later you try to change it with capital letters "NC", which is not a parameter (instance.NC=4). Minor typos like these can cause confusion and give you errors. Make sure to fix them and give it a try again

How to read parameters of layers of .tflite model in python

I was trying to read tflite model and pull all the parameters of the layers out.
My steps:
I generated flatbuffers model representation by running (please build flatc before):
flatc -python tensorflow/tensorflow/lite/schema/schema.fbs
Result is tflite/ folder that contains layer description files (*.py) and some utilitarian files.
I successfully loaded model:
in case of import Error: set PYTHONPATH to point to the folder where tflite/ is
from tflite.Model import Model
def read_tflite_model(file):
buf = open(file, "rb").read()
buf = bytearray(buf)
model = Model.GetRootAsModel(buf, 0)
return model
I partly pulled model and node parameters out and stacked in iterating over nodes:
Model part:
def print_model_info(model):
version = model.Version()
print("Model version:", version)
description = model.Description().decode('utf-8')
print("Description:", description)
subgraph_len = model.SubgraphsLength()
print("Subgraph length:", subgraph_len)
Nodes part:
def print_nodes_info(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
operators_len = subgraph.OperatorsLength()
print('Operators length:', operators_len)
from collections import deque
nodes = deque(subgraph.InputsAsNumpy())
STEP_N = 0
MAX_STEPS = operators_len
print("Nodes info:")
while len(nodes) != 0 and STEP_N <= MAX_STEPS:
print("MAX_STEPS={} STEP_N={}".format(MAX_STEPS, STEP_N))
print("-" * 60)
node_id = nodes.pop()
print("Node id:", node_id)
tensor = subgraph.Tensors(node_id)
print("Node name:", tensor.Name().decode('utf-8'))
print("Node shape:", tensor.ShapeAsNumpy())
# which type is it? what does it mean?
type_of_tensor = tensor.Type()
print("Tensor type:", type_of_tensor)
quantization = tensor.Quantization()
min = quantization.MinAsNumpy()
max = quantization.MaxAsNumpy()
scale = quantization.ScaleAsNumpy()
zero_point = quantization.ZeroPointAsNumpy()
print("Quantization: ({}, {}), s={}, z={}".format(min, max, scale, zero_point))
# I do not understand it again. what is j, that I set to 0 here?
operator = subgraph.Operators(0)
for i in operator.OutputsAsNumpy():
nodes.appendleft(i)
STEP_N += 1
print("-"*60)
Please point me to documentation or some example of using this API.
My problems are:
I can not get documentation on this API
Iterating over Tensor objects seems not possible for me, as it doesn't have Inputs and Outputs methods. + subgraph.Operators(j=0) I do not understand what j means in here. Because of that my cycle goes through two nodes: input (once) and the next one over and over again.
Iterating over Operator objects is surely possible:
Here we iterate over them all but I can not get how to map Operator and Tensor.
def print_in_out_info_of_all_operators(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
for i in range(subgraph.OperatorsLength()):
operator = subgraph.Operators(i)
print('Outputs', operator.OutputsAsNumpy())
print('Inputs', operator.InputsAsNumpy())
I do not understand how to pull parameters out Operator object. BuiltinOptions method gives me Table object, that I do not know what to map at.
subgraph = model.Subgraphs(0)
What does this 0 mean? should it always be zero? obviously no, but what is it? Id of the subgraph? If so - I'm happy. If no, please try to explain it.

How to map different indices in Pyomo?

I am a new Pyomo/Python user. Now I need to formulate one set of constraints with index 'n', where all of the 3 components are with different indices but correlate with index 'n'. I am just curious that how I can map the relationship between these sets.
In my case, I read csv files in which their indices are related to 'n' to generate my set. For example: a1.n1, a2.n3, a3.n5 /// b1.n2, b2.n4, b3.n6, b4.n7 /// c1.n1, c2.n2, c3.n4, c4.n6 ///. The constraint expression of index n1 and n2 is the follows for example:
for n1: P(a1.n1) + L(c1.n1) == D(n1)
for n2: - F(b1.n2) + L(c2.n2) == D(n2)
Now let's go the coding. The set creating codes are as follow, they are within a class:
import pyomo
import pandas
import pyomo.opt
import pyomo.environ as pe
class MyModel:
def __init__(self, Afile, Bfile, Cfile):
self.A_data = pandas.read_csv(Afile)
self.A_data.set_index(['a'], inplace = True)
self.A_data.sort_index(inplace = True)
self.A_set = self.A_data.index.unique()
... ...
Then I tried to map the relationship in the constraint construction like follows:
def createModel(self):
self.m = pe.ConcreteModel()
self.m.A_set = pe.Set( initialize = self.A_set )
def obj_rule(m):
return ...
self.m.OBJ = pe.Objective(rule = obj_rule, sense = pe.minimize)
def constr(m, n)
As = self.A_data.reset_index()
Amap = As[ As['n'] == n ]['a']
Bs = self.B_data.reset_index()
Bmap = Bs[ Bs['n'] == n ]['b']
Cs = self.C_data.reset_index()
Cmap = Cs[ Cs['n'] == n ]['c']
return sum(m.P[(p,n)] for p in Amap) - sum(m.F[(s,n)] for s in Bmap) + sum(m.L[(r,n)] for r in Cmap) == self.D_data.ix[n, 'D']
self.m.cons = pe.Constraint(self.m.D_set, rule = constr)
def solve(self):
... ...
Finally, the error raises when I run this:
KeyError: "Index '(1, 1)' is not valid for indexed component 'P'"
I know it is the wrong way, so I am wondering if there is a good way to map their relationships. Thanks in advance!
Gabriel
I just forgot to post my answer to my own question when I solved this one week ago. The key thing towards this problem is setting up a map index.
Let me just modify the code in the question. Firstly, we need to modify the dataframe to include the information of the mapped indices. Then, the set for the mapped index can be constructed, taking 2 mapped indices as example:
self.m.A_set = pe.Set( initialize = self.A_set, dimen = 2 )
The names of the two mapped indices are 'alpha' and 'beta' respectively. Then the constraint can be formulated, based on the variables declared at the beginning:
def constr(m, n)
Amap = self.A_data[ self.A_data['alpha'] == n ]['beta']
Bmap = self.B_data[ self.B_data['alpha'] == n ]['beta']
return sum(m.P[(i,n)] for i in Amap) + sum(m.L[(r,n)] for r in Bmap) == D.loc[n, 'D']
m.TravelingBal = pe.Constraint(m.A_set, rule = constr)
The summation groups all associated B to A with a mapped index set.