what does tfma.metrics.MeanLabel do? - tensorflow

Can someone explain to me what tfma.metrics.MeanLabe does and how it should be used and what is the difference between tfma.metrics.MeanLabe and tfma.metrics.MeanPredictio and tfma.metrics.MeanAttributions. I am not sure why there is no explanation about these functions and the job that they do? How I can understand the details about them?
I appreciate it if someone can explain the job of these metrics.
Thanks

TFMA provides support for calculating metrics that were used at training time (i.e. built-in metrics) as well metrics defined after the model was saved as part of the TFMA configuration settings.
tfma.metrics.* consists of Standard TFMA metrics and plots.
tfma.metrics.MeanLabel calculates mean label by calculating the ratio of total weighted labels and total weighted examples.
tfma.metrics.MeanPrediction calculates mean prediction by calculating the ratio of total weighted predictions and total weighted examples.
tfma.metrics.MeanAttributions calculates mean attributions by calculating contribution of each input feature to the prediction made by the model.
This metrics are provided in metrics_specs section of tfma.EvalConfig which holds specifications for the model, metrics, and slices that are to be evaluated. Please refer TFMA tutorial for better understanding on using these metrics.
Hope this helps. Thank you!

Related

Can you save the inferences/predictions on a list and use adam optimizer in each of them after

I am new to Tensorflow and in my current project I can't inmediately calculate the loss after a prediction/inference but rather every 2 or 3 predictions, so I was thinking in saving the tensors of each prediction in a list and running them trough the optimizer after.
I am new to tensorflow and not very familiarized with it so if there is no way to do this, other ways to tackle the problem are welcome.
Thanks in advance for your help !
why you can calculate the loss?
If I understand your question right. your situation is very similar to in-graph distributed model. let each GPU/server compute a batch then gather all inference and loss , compute their average, then update variable.

Is there no precise implementation of batch normalization in tensorflow and why?

What is precisely done by batch normalization at inference phase is to normalize each layer with a population mean and an estimated population variance
But it seems every tensorflow implementation (including this one and the official tensorflow implementation) uses (exponential) moving average and variance.
Please forgive me, but I don't understand why. Is it because using moving average is just better for performance? Or for a pure computational speed sake?
Refercence: the original paper
Exact update rule for sample mean is just an exponential averaging with a step equal to inverse sample size. So, if you know sample size, you could just set the decay factor to be 1/n, where n is sample size. However, decay factor usually does not matter if chosen to be very close to one, as exponetital averaging with such decay rate still provides very close approximation of mean and variance, especially on large datasets.

What is the average log-perplexity in seq2seq modules in tensorflow?

Output of the following tensorflow function should give average log perplexity. I went through the source code. But I don't understand how they calculate that loss.
tf.contrib.legacy_seq2seq.sequence_loss(logits, targets, weights, average_across_timesteps=True, average_across_batch=True, softmax_loss_function=None, name=None)
I went through the tensorflow implementation. Through the perplexity has some broad meaning here in this function perplexity means
two to the power of your total cross entropy loss.
Please refer the first answer of this question.

what is the difference between sampled_softmax_loss and nce_loss in tensorflow?

i notice there are two functions about negative Sampling in tensorflow to compute the loss (sampled_softmax_loss and nce_loss). the paramaters of these two function are similar, but i really want to know what is the difference between the two?
Sample softmax is all about selecting a sample of the given number and try to get the softmax loss. Here the main objective is to make the result of the sampled softmax equal to our true softmax. So algorithm basically concentrate lot on selecting the those samples from the given distribution.
On other hand NCE loss is more of selecting noise samples and try to mimic the true softmax. It will take only one true class and a K noise classes.
Sampled softmax tries to normalise over all samples in your output. Having a non-normal distribution (logarithmic over your labels) this is not an optimal loss function. Note that although they have the same parameters, they way you use the function is different. Take a look at the documentation here: https://github.com/calebchoo/Tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.nn.nce_loss.md and read this line:
By default this uses a log-uniform (Zipfian) distribution for sampling, so your labels must be sorted in order of decreasing frequency to achieve good results. For more details, see log_uniform_candidate_sampler.
Take a look at this paper where they explain why they use it for word embeddings: http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf
Hope this helps!
Check out this documentation from TensorFlow https://www.tensorflow.org/extras/candidate_sampling.pdf
They seem pretty similar, but sampled softmax is only applicable for a single label while NCE extends to the case where your labels are a multiset. NCE can then model the expected counts rather than presence/absence of a label. I'm not clear on an exact example of when to use the sampled_softmax.

xgboost using the auc metric correctly

I have a slightly imbalanced dataset for a binary classification problem, with a positive to negative ratio of 0.6.
I recently learned about the auc metric from this answer: https://stats.stackexchange.com/a/132832/128229, and decided to use it.
But I came across another link http://fastml.com/what-you-wanted-to-know-about-auc/ which claims that, the AUC-ROC is insensitive to class imbalance, and we should use AUC for a precision-recall curve.
The xgboost docs are not clear on which AUC they use, do they use AUC-ROC?
Also the link mentions that AUC should only be used if you do not care about the probability and only care about the ranking.
However since i am using a binary:logistic objective i think i should care about probabilities since i have to set a threshold for my predictions.
The xgboost parameter tuning guide https://github.com/dmlc/xgboost/blob/master/doc/how_to/param_tuning.md
also suggests an alternate method to handle class imbalance, by not balancing positive and negative samples and using max_delta_step = 1.
So can someone explain, when is the AUC preffered over the other method for xgboost to handle class imbalance. And if i am using AUC , what is the threshold i need to set for prediction or more generally how exactly should i use AUC for handling imbalanced binary classification problem in xgboost?
EDIT:
I also need to eliminate false positives more than false negatives, how can i achieve that, apart from simply varying the threshold, with binary:logistic objective?
According the xgboost parameters section in here there is auc and aucprwhere prstands for precision recall.
I would say you could build some intuition by running both approaches and see how the metrics behave. You can include multiple metric and even optimize with respect to whichever you prefer.
You can also monitor the false positive (rate) in each boosting round by creating custom metric.
XGboost chose to write AUC (Area under the ROC Curve), but some prefer to be more explicit and say AUC-ROC / ROC-AUC.
https://xgboost.readthedocs.io/en/latest/parameter.html