Dirichlet parameters don't update in JAGS - bayesian

I am trying to run a hierarchical Dirichlet model in JAGS but I have no update and must do something wrong. I try to approximate it with the gamma distribution:
#Creating some data
set.seed(555)
cat1=rbeta(15,20,60)
cat2=rbeta(15,20,80)
cat3=rbeta(15,20,160)
cat4=1-cat1-cat2-cat3
dat_dirich=list(
dirF=cbind(cat1, cat2, cat3, cat4),
AL=c(1,1,1,1),
K=4
)
require(runjags)
{
catch_mod="model{
for(y in 1:15){
dirF[y,1:K]~ddirch(alpha_dirF1B[y,1:K])
#Approximation with Gamma
for(a in 1:K){
alpha_dirF1B[y,a]<-P2[y,a]/sum(P2[y,1:K])
P2[y,a]~dgamma(alpha_dirF1[a,y],1)#1 or kappa_dirF
}
#hierarchical structure
alpha_dirF1[1:K,y]~ddirch(AL*kappa_dirF)
}
kappa_dirF~dunif(0.1,5000) #kappa_dirF~dlnorm(0,0.01)
}"
}

You have to have a call to run.jags() somewhere in your code. This should look something like
results <- run.jags(catch_mod, data = dat_dirich, n.chains = XXX, ...)
This is described in the vignette for runjags (https://cran.r-project.org/web/packages/runjags/vignettes/quickjags.html)

Related

Tensorflow v2.10 mutate output of signature function to be a map of label to results

I'm trying to save my model so that when called from tf-serving the output is:
{
"results": [
{ "label1": x.xxxxx, "label2": x.xxxxx },
{ "label1": x.xxxxx, "label2": x.xxxxx }
]
}
where label1 and label2 are my labels and x.xxxxx are the probability of that label.
This is what I'm trying:
class TFModel(tf.Module):
def __init__(self, model: tf.keras.Model) -> None:
self.labels = ['label1', 'label2']
self.model = model
#tf.function(input_signature=[tf.TensorSpec(shape=(1, ), dtype=tf.string)])
def prediction(self, pagetext: str):
return
{ 'results': tf.constant([{k: v for dct in [{self.labels[c]: f"{x:.5f}"} for (c,x) in enumerate(results[i])] for k, v in dct.items()}
for i in range(len(results.numpy()))])}
# and then save it:
tf_model_wrapper = TFModel(classifier_model)
tf.saved_model.save(tf_model_wrapper.model,
saved_model_path,
signatures={'serving_default':tf_model_wrapper.prediction}
)
Side Note: Apparently in TensorFlow v2.0 if signatures is omitted it should scan the object for the first #tf.function (according to this: https://www.tensorflow.org/api_docs/python/tf/saved_model/save) but in reality that doesn't seem to work. Instead, the model saves successfully with no errors and the #tf.function is not called, but default output is returned instead.
The error I get from the above is:
ValueError: Got a non-Tensor value <tf.Operation 'PartitionedCall' type=PartitionedCall> for key 'output_0' in the output of the function __inference_prediction_125493 used to generate the SavedModel signature 'serving_default'. Outputs for functions used as signatures must be a single Tensor, a sequence of Tensors, or a dictionary from string to Tensor.
I wrapped the result in tf.constant above because of this error, thinking it might be a quick fix, but I think it's me just being naive and not understanding Tensors properly.
I tried a bunch of other things before learning that [all outputs must be return values].1
How can I change the output to be as I want it to be?
You can see a Tensor as a multidimensional vector, i.e a structure with a fixed size and dimension and containing elements sharing the same type. Your return value is a map between a string and a list of dictionaries. A list of dictionaries cannot be converted to a tensor, because there is no guarantee that the number of dimensions and their size is constant, nor a guarantee that each element is sharing the same type.
You could instead return the raw output of your network, which should be a tensor and do your post processing outside of tensorflow-serving.
If you really want to do something like in your question, you can use a Tensor of strings instead, and you could use some code like that:
labels = tf.constant(['label1', 'label2'])
# if your batch size is dynamic, you can use tf.shape on your results variable to find it at runtime
batch_size = 32
# assuming your model returns something with the shape (N,2)
results = tf.random.uniform((batch_size,2))
res_as_str = tf.strings.as_string(results, precision=5)
return {
"results": tf.stack(
[tf.tile(labels[None, :], [batch_size, 1]), res_as_str], axis=-1
)
}
The output will be a dictionary mapping the value "results" to a Tensor of dimensions (Batch, number of labels, 2), the last dimension containing the label name and its corresponding value.

How to use the 'sphereize data' option with PCA in TensorFlow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/
I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?
The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.
Here is the code used in Tensorboard (in typescript):
normalize() {
// Compute the centroid of all data points.
let centroid = vector.centroid(this.points, (a) => a.vector);
if (centroid == null) {
throw Error('centroid should not be null');
}
// Shift all points by the centroid and make them unit norm.
for (let id = 0; id < this.points.length; ++id) {
let dataPoint = this.points[id];
dataPoint.vector = vector.sub(dataPoint.vector, centroid);
if (vector.norm2(dataPoint.vector) > 0) {
// If we take the unit norm of a vector of all 0s, we get a vector of
// all NaNs. We prevent that with a guard.
vector.unit(dataPoint.vector);
}
}
}
You can reproduce that normalization using the following python function:
def sphereize_data(x):
"""
x is a 2D Tensor of shape :(num_vectors, dim_vectors)
"""
centroids = tf.reduce_mean(x, axis=0, keepdims=True)
return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))

TensorFlow - how to import data with multiple labels

I'm trying to create a model in TensorFlow which predicts ideal item for a user by predicting a vector of numbers.
I have created a dataset in Spark and saved it as a TFRecord using Spark TensorFlow connector.
In the dataset, I have several hundreds of features and 20 labels in each row. For easier manipulation, I have given every column a prefix 'feature_' or 'label_'.
Now I'm trying to write input function for TensorFlow, but I can't figure out how to parse the data.
So far I have written this:
def dataset_input_fn():
path = ['data.tfrecord']
dataset = tf.data.TFRecordDataset(path)
def parser(record):
example = tf.train.Example()
example.ParseFromString(record)
# TODO: no idea what to do here
# features = parsed["features"]
# label = parsed["label"]
# return features, label
dataset = dataset.map(parser)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(100)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
How can I split the Example into a feature set and a label set? I have tried to split the Example into two parts, but there is no way to even access it. The only way I have managed to access it is by printing the example out, which gives me something like this.
features {
...
feature {
key: "feature_wishlist_hour"
value {
int64_list {
value: 0
}
}
}
feature {
key: "label_emb_1"
value {
float_list {
value: 0.4
}
}
}
feature {
key: "label_emb_2"
value {
float_list {
value: 0.8
}
}
}
...
}
Your parser function should be similar to how you constructed the example proto. In your case its should be something similar to:
# example proto decode
def parser(example_proto):
keys_to_features = {'feature_wishlist_hour':tf.FixedLenFeature((), tf.int64),
'label_emb_1': tf.FixedLenFeature((), tf.float32),
'label_emb_2': tf.FixedLenFeature((), tf.float32)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return parsed_features['feature_wishlist_hour'], (parsed_features['label_emb_1'], parsed_features['label_emb_2'])
EDIT: From the comments it seems you are encoding each of the features as key, value pair, which is not right. Check this answer: Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords? on how to write it in a proper way.

Changing label name when retraining Inception on Google Cloud ML

I currently follow the tutorial to retrain Inception for image classification:
https://cloud.google.com/blog/big-data/2016/12/how-to-train-and-classify-images-using-google-cloud-machine-learning-and-cloud-dataflow
However, when I make a prediction with the API I get only the index of my class as a label. However I would like that the API actually gives me a string back with the actual class name e.g instead of
​predictions:
- key: '0'
prediction: 4
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
I would like to get:
​predictions:
- key: '0'
prediction: ROSES
scores:
- 8.11998e-09
- 2.64907e-08
- 1.10307e-06
Looking at the reference for the Google API it should be possible:
https://cloud.google.com/ml-engine/reference/rest/v1/projects/predict
I already tried to change in the model.py the following to
outputs = {
'key': keys.name,
'prediction': tensors.predictions[0].name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
to
if tensors.predictions[0].name == 0:
pred_name ='roses'
elif tensors.predictions[0].name == 1:
pred_name ='tulips'
outputs = {
'key': keys.name,
'prediction': pred_name,
'scores': tensors.predictions[1].name
}
tf.add_to_collection('outputs', json.dumps(outputs))
but this doesn't work.
My next idea was to change this part in the preprocess.py file. So instead getting the index I want to use the string label.
def process(self, row, all_labels):
try:
row = row.element
except AttributeError:
pass
if not self.label_to_id_map:
for i, label in enumerate(all_labels):
label = label.strip()
if label:
self.label_to_id_map[label] = label #i
and
label_ids = []
for label in row[1:]:
try:
label_ids.append(label.strip())
#label_ids.append(self.label_to_id_map[label.strip()])
except KeyError:
unknown_label.inc()
but this gives the error:
TypeError: 'roses' has type <type 'str'>, but expected one of: (<type 'int'>, <type 'long'>) [while running 'Embed and make TFExample']
hence I thought that I should change something here in preprocess.py, in order to allow strings:
example = tf.train.Example(features=tf.train.Features(feature={
'image_uri': _bytes_feature([uri]),
'embedding': _float_feature(embedding.ravel().tolist()),
}))
if label_ids:
label_ids.sort()
example.features.feature['label'].int64_list.value.extend(label_ids)
But I don't know how to change it appropriately as I could not find someting like str_list. Could anyone please help me out here?
Online prediction certainly allows this, the model itself needs to be updated to do the conversion from int to string.
Keep in mind that the Python code is just building a graph which describes what computation to do in your model -- you're not sending the Python code to online prediction, you're sending the graph you build.
That distinction is important because the changes you have made are in Python -- you don't yet have any inputs or predictions, so you won't be able to inspect their values. What you need to do instead is add the equivalent lookups to the graph that you're exporting.
You could modify the code like so:
labels = tf.constant(['cars', 'trucks', 'suvs'])
predicted_indices = tf.argmax(softmax, 1)
prediction = tf.gather(labels, predicted_indices)
And leave the inputs/outputs untouched from the original code

How to get an importance of a class using random forest?

I am using randomForest package in my dataset to do a classification, but with the importance command I only get the importance of variables. So, if I want the variable importance by specific categories of variables? Like a specific location in a region variable, how much that region impact in the total. I thought in transformer every class in a dummy, but i don't know if this is really a good idea.
I think you mean "variable importance by specific categories of variables". That has not been implemented, but I guess it would be possible, meaningful and perhaps useful. Of course it would not be meaningful for variables with only two categories.
I would implement it something like:
Train model -> compute out-of-bag prediction performance (OOB-cv1) -> permute specific category by specific variable (reassign this category randomly to other categories, weighted by other category prevalence) -> re-compute out-of-bag- prediction performance (OOB-cv2) -> subtract OOB-cv1 from OOB-cv2
And then I wrote the a function implementing categorical specific variable importance.
library(randomForest)
#Create some classification problem, with mixed categorical and numeric vars
#Cat A of var 1, cat B of var 2 and Cat C of var 3 influence class the most.
X.cat = replicate(3,sample(c("A","B","C"),600,rep=T))
X.val = replicate(2,rnorm(600))
y.cat = 3*(X.cat[,1]=="A") + 3*(X.cat[,2]=="B") + 3*(X.cat[,3]=="C")
y.cat.err = y.cat+rnorm(600)
y.lim = quantile(y.cat.err,c(1/3,2/3))
y.class = apply(replicate(2,y.cat.err),1,function(x) sum(x>y.lim)+1)
y.class = factor(y.class,labels=c("ann","bob","chris"))
X.full = data.frame(X.cat,X.val)
X.full[1:3] = lapply(X.full[1:3],as.factor)
#train forest
rf=randomForest(X.full,y.class,keep.inbag=T,replace=T)
#make function to compute crovalidated classification error
oobErr = function(rf,X) {
preds = predict(rf,X,type="vote",predict.all = T)$individual
preds[rf$inbag!=0]=NA
oob.pred = apply(preds,1,function(x) {
tabx=sort(table(x),dec=T)
majority.vote = names(tabx)[1]
})
return(mean(as.character(rf$y)!=oob.pred))
}
#make function to iterate all categories of categorical variables
#and compute change of OOB class error due to permutation of category
catVar = function(rf,X,nPerm=2) {
ref = oobErr(rf,X)
catVars = which(rf$forest$ncat>1)
lapply(catVars, function(iVar) {
catImp = replicate(nPerm,{
sapply(levels(X[[iVar]]), function(thisCat) {
thisCat.ind = which(thisCat==X[[iVar]])
X[thisCat.ind,iVar] = head(sample(X[[iVar]]),length(thisCat.ind))
varImp = oobErr(rf,X)-ref
})
})
if(nPerm==1) catImp else apply(catImp,1,mean)
})
}
#try it out
out = catVar(rf,X.full,nPerm=4)
print(out) #seems like it works as it should
$X1
A B C
0.14000 0.07125 0.06875
$X2
A B C
0.07458333 0.16083333 0.07666667
$X3
A B C
0.05333333 0.08083333 0.15375000