Why is bayesian glm values negative? - bayesian

I have a very basic pymc3 GLM Bernoulli model and was wondering why the values are negative? I must be doing something wrong, but I wish to be able to make predictions with the model.
How would I go about making predictions if the values are negative? I am looking to give an x value, use the model, and get a probability. Can anyone please explain to me how to be able to do that.

Related

Dealing with missing values in tensorflow

I need some guidance on the approach to imputation in tensorflow/deep learning. I am familiar with how scikit-learn handles imputation, and when I map it to the tensorflow ecosystem, I would expect to use preprocessing layers in keras or functions in tensorflow transform to do the imputation. However, at least to my knowledge, these functions do not exist. So I have a few questions:
Is there a reason tied to how deep learning works that these functions do not exist (for example, dense sampling needs to be as accurate as possible, and you have a large amount of data, hence imputation is never required)
If it is not #1, how should one handle imputation in tensorflow? For example, during serving, your input could be missing data, and there's nothing you can do about that. I would think integrating it into preprocessing_fn would be the thing to do.
Is it possible to have the graph do different things during training and serving? For example, train on no missing values data, and if during serving you encounter that situation, do something like ignore that value or set it to a specified default.
Thank you!
Please refer to Mean imputation for missing data to impute missing values from your data with mean.
In the example below, x is a feature, represented as a tf.SparseTensor in the preprocessing_fn. In order to convert it to a dense tensor, we compute its mean, and set the mean to be the default value when it is missing from an instance.
Answering your third question, TensorFlow Transform builds transformations into the TensorFlow graph for your model so the same transformations are performed at training and inference time.
For your mentioned use-case, the below example for imputation would work, because default_value param sets values for indices if not specified. And if default_value param is not set, it defaults to Zero.
Example Code:
def preprocessing_fn(inputs):
return {
'x_out': tft.sparse_tensor_to_dense_with_shape(
inputs['x'], default_value=tft.mean(x), shape=[None, 1])
}

Loss function variational Autoencoder in Tensorflow example

I have a question regarding the loss function in variational autoencoder. I followed the tensorflow example https://www.tensorflow.org/tutorials/generative/cvae to create a LSTM-VAE, for sampling a sinus function.
My encoder-input is a set of points (x_i,sin(x_i)) for a specific range (randomly sampled), and as output of the decoder I expect similar values.
In the tensorflow guide, there is cross-entropy used to compare the encoder input with the decoder output.
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
This makes sense, because the input and output are treated as probabilities. But in reality these probabily functions represent the sets of my sinus function.
Can't I simply use a mean-squared-error instead of the cross-entropy (I tried it and it works well) or causes this a wrong behaviour of the architecture at some point?
Best regards and thanks for your help!
Well, such questions happen when you work too much and stop thinking properly. For the sake of solving this, it makes sense to think about what I'm trying to do.
p(x|z) is the decoder reconstruction, what means, that by sampling from z the value x is generated with the probability of p. In the tensorflow-example image-classification/generation is used, in that case crossentropy makes sense. I simply want to minimize the distance between my input and output. The use of mse is kind of logical.
Hope that helps someone at some point.
Regards.

Tensorflow bounded regression vs classification

As part of my masters thesis I have been tasked with predicting a label integer (0-255) which is a binned representation of an angle. The feature columns are also integers, in the range (0-255).
So far I have used the custom Tensorflow layers estimator, implementing a 256 output classifier which performs well. However, my issue with the classification approach I am using is the following:
My classification model thinks that predicting a 3 instead of a 28 is as good/bad as predicting a 27 as a 28
The numerical interval / ordinal nature of my data (not sure which) leads me to believe that if I used regression I would achieve results with less drastically incorrect predictions or outliers.
My goal:
to reduce the number of drastically incorrect predicted outliers
My questions:
Is regression the better approach, or can I improve my
classification to include an ordinal/interval relationship between
my labels?
If I choose regression, is there a way to bound my predicted output between 0-255 (I know I will have to round float values predicted).
Thanks in advance. Any other comments, suggestions or ideas to help me to best tackle the problem are also very helpful.
If I made any incorrect assumptions or mistake in my interpretation of the problem feel free to correct me.
Question 1: Regression is the simpler approach, however, you can also use classification and manipulate the loss function to have a lower loss for misclassifications that are "close" to the original class.
Question 2: The tensorflow command for bounding your prediction is tf.clip_by_value. Are you mapping all 360 degrees to [0,255]? In that case you will want to consider the boundary cases, i.e. your estimator yields -4 and the true value is 251, but they are the actually representing the same value so loss should be 0.

Is it possible to extract confidence values for regression predictions in tensorflow?

Can I extract the confidence values or variance in prediction error from a tensorflow regressor? e.g. if the model gives a prediction x, then can I know the confidence band, like is x in +-25% range of the actual value?
I'm afraid it's not as easy as when using sofmax in the output layer. As said in here you can use the MSE of the NN on the validation as an estimate for variance, then use your desired value of confidence. Be aware that this approach assumes a lot of things (ie. distribution of errors is allways the same which may not be true) so if you really need those confidence intervals, a regression NN is not the best fit for you.

xgboost using the auc metric correctly

I have a slightly imbalanced dataset for a binary classification problem, with a positive to negative ratio of 0.6.
I recently learned about the auc metric from this answer: https://stats.stackexchange.com/a/132832/128229, and decided to use it.
But I came across another link http://fastml.com/what-you-wanted-to-know-about-auc/ which claims that, the AUC-ROC is insensitive to class imbalance, and we should use AUC for a precision-recall curve.
The xgboost docs are not clear on which AUC they use, do they use AUC-ROC?
Also the link mentions that AUC should only be used if you do not care about the probability and only care about the ranking.
However since i am using a binary:logistic objective i think i should care about probabilities since i have to set a threshold for my predictions.
The xgboost parameter tuning guide https://github.com/dmlc/xgboost/blob/master/doc/how_to/param_tuning.md
also suggests an alternate method to handle class imbalance, by not balancing positive and negative samples and using max_delta_step = 1.
So can someone explain, when is the AUC preffered over the other method for xgboost to handle class imbalance. And if i am using AUC , what is the threshold i need to set for prediction or more generally how exactly should i use AUC for handling imbalanced binary classification problem in xgboost?
EDIT:
I also need to eliminate false positives more than false negatives, how can i achieve that, apart from simply varying the threshold, with binary:logistic objective?
According the xgboost parameters section in here there is auc and aucprwhere prstands for precision recall.
I would say you could build some intuition by running both approaches and see how the metrics behave. You can include multiple metric and even optimize with respect to whichever you prefer.
You can also monitor the false positive (rate) in each boosting round by creating custom metric.
XGboost chose to write AUC (Area under the ROC Curve), but some prefer to be more explicit and say AUC-ROC / ROC-AUC.
https://xgboost.readthedocs.io/en/latest/parameter.html