xgb2sql output not matching with predict function output for XGBoost

xgb2sql output not matching with predict function output for XGBoost - xgboost

The package xgb2sql is used to get the sql version of the XGboost model - I am trying to match its output with the output of the predict function on the xgboost model object.
But both results don't match. What could be the reason - both outputs should be probabilities and match?

Related

TF Serving Predict API Output Interpretation

Is the TensorFlow Serving (TFS) Predict API output the same as the tf.keras.model.predict method (i.e. the outputs of the model according to the compiled metrics)?
For example, if we have a tf.keras.model compiled with BinaryAccuracy metric, will the output of the TFS predict API be a list of binary accuracy values for each one of the inputs of the predict request?
Thanks in advance!

I am not able to clearly get your question about compiled metrics and the output prediction of the model. But here's the comparision of outputs from Keras predict method and TF Serving's Predict API.
The output format of prediction for both Keras and TF Serving Predict API is similar, which emits a list of probability values of the data point belonging to each class.
Consider that you have a 10 class classification model and you're sending 4 data points to predict method, The output will be of shape 4x10, wherein for each data point the predicted result contains the probability of that data point belonging to each classes(0–9).
Here's a sample prediction
predictions = [
[8.66183618e-05 1.06925681e-05 1.40683464e-04 4.31487868e-09
7.31811961e-05 6.07917445e-06 9.99673367e-01 7.10965661e-11
9.43153464e-06 1.98050812e-10],
[6.35617238e-04 9.08200348e-10 3.23482091e-05 4.98994159e-05
7.29685112e-08 4.77315152e-05 4.25152575e-06 4.23201502e-10
9.98981178e-01 2.48882337e-04],
[9.99738038e-01 3.85520025e-07 1.05982785e-04 1.47284098e-07
5.99268958e-07 2.26216093e-06 1.17733900e-04 2.74483864e-05
3.30203284e-06 4.03360673e-06],
[3.42538192e-06 2.30619257e-09 1.29460409e-06 7.04832928e-06
2.71432992e-08 1.95419183e-03 9.96945918e-01 1.80040043e-12
1.08795590e-03 1.78136176e-07]]
You can take a look at the output of make_prediction() function in this reference to understand about how the Predict API in TF Serving works. Thank you!

How to predict on a test sequence using a distilbert model?

Im trying to predict on a test sequence using Ktrain with a distilbert model, my code looks like this:
trn, val, preproc = text.texts_from_array(x_train=x_train, y_train=y_train,
x_test=x_test, y_test=y_test,
class_names=train_b.target_names,
preprocess_mode='distilbert',
maxlen=350)
model = text.text_classifier('distilbert', train_data=trn, preproc=preproc,multilabel=True)
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=64)
y_pred = learner.model.predict(val,verbose = 0)
In the other implementation of models like nbsvm, fasttext, bigru from Ktrain its quite easy as texts_from_array function returns a numpy array but with distilbert it returns a TransformerDataset, it's therefore not possible to predict on a sequence with learner.model.predict() as it generates a python index exception. Its also not possible for me to use the validate() method to generate a confusion matrix given that I have multi label classification problem. My question is how can I therefore test on a test sequence with Ktrain using distilbert, my need for this comes from the fact that my metric function is implemented based on sklearn.metric library and it needs test and validation sequence in a numpy format.

You can use a Predictor instance as shown in the tutorial.
The Predictor simply uses the preproc object to transform the raw text into the format expected by the model and feeds this to the model.

Extract the output of the embedding layer

I am trying to build a regression model, for which I have a nominal variable with very high cardinality. I am trying to get the categorical embedding of the column.
Input:
df["nominal_column"]
Output:
the embeddings of the column.
I want to use the op of the embedding column alone since I would require that as a input to my traditional regression model. Is there a way to extract that output alone.
P.S I am not asking for code, any suggestion on the approach would be great.

If the embedding is part of the model and you train it, then you can use functional API of keras to get output of any intermediate operation in your graph:
x=Input((number_of_categories,))
y=Embedding(parameters_of_your_embeddings)(x)
output=Rest_of_your_model()(y)
model=Model(inputs=[x],outputs=[output,y])
if you do it before you train the model, you'll have to define custom loss function, that deals only with part of the output. The other way is to train the model with just one output, then create identical model with two outputs and set the weights of the second model from the trained one.
If you want to get the embedding matrix from your model, you can just use method get_weights of the embedding layer which returns the weights in numpy array.

xgboost.train probability output needed

XGBClassifier outputs probabilities if we use the method "predict_proba", however, when I train the model using xgboost.train, I cannot figure out how to get probabilities as output. Here is a chunk of my code:
dtrain=xgb.DMatrix(X_train, label=y)
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
modelXG=xgb.train(param,dtrain,xgb_model='xgbmodel')

xgboost.train() returns a xgb.Booster object. The xgb.Booster.predict() call returns probabilities in the case of a classification problem instead of the expected labels, if you are used to the .predict()methods of sklearn models. So modelXG.predict(dtest) call will give you want you need.

How to get the output of the final layer of the model in CNTK?

How do I get the output of the model?
I'm writing a classifier using CNTK, and I want to print out the probability distribution (final output) so I can manually evaluate my results. How can I do that?
Right now, for evaluation, I use evaluation methods provided in CNTK, which doesn't require me to get the output of the model.
Thanks a bunch!

If you have a model function (z). You can convert the model output to probabilities using the softmax function.
C.softmax(z).eval().
You can pass the necessary data in the eval function.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

xgb2sql output not matching with predict function output for XGBoost - xgboost

The package xgb2sql is used to get the sql version of the XGboost model - I am trying to match its output with the output of the predict function on the xgboost model object. But both results don't match. What could be the reason - both outputs should be probabilities and match?

Related

TF Serving Predict API Output Interpretation

How to predict on a test sequence using a distilbert model?

Extract the output of the embedding layer

xgboost.train probability output needed

How to get the output of the final layer of the model in CNTK?

Categories

Resources