I am trying to use what if tool on my xgboost model.
But on the link I am only able to find examples of xgboost used through google AI Platform. Is there any way we can use whatif tool on XGboost without Google AI platform
I tried the functions that were used in examples for tensorflow and keras and used functions set_estimator_and_feature_spec and set_compare_custom_predict_fn
bst = xgb.XGBClassifier(
objective='reg:logistic'
)
bst.fit(x_train, y_train)
test_examples = df_to_examples(df_test)
config_builder = WitConfigBuilder(test_examples).set_custom_predict_fn(xg.predict)
WitWidget(config_builder)
When trying to perform run inference, an error msg is displayed cannot initialize DMatrix from a list and I am not unable to do it
After a lot of trial and error I finally got it to work with XGBoost using the following code:
# first argument in my case is a Pandas dataframe of all features and the target 'label';
# extract just the numpy array with 'values', then convert that to a list
# second argument is the column name as a list
# I use a sklearn pipeline, so here I am just accessing the classifier model which is an XGBClassifier instance
# Wit tool expects a 2D array, where the 1st dimension is each sample, and 2nd dimension is probabilities of
# each class; so use 'predict_proba' over 'predict'
config_builder = (WitConfigBuilder(df_sample.values.tolist(), df_sample.columns.tolist())
.set_custom_predict_fn(clf['classifier'].predict_proba)
.set_target_feature('label')
.set_label_vocab(['No Churn', 'Churn']))
This eliminated the need to use their suggested helper functions and works out-of-the-box with Pandas DataFrames and Sklearn ML Models
Related
How should I use a tf Dataset in order run model.predict(data) and have access to the other features of the tf Dataset?
For example: my tf dataset has this format:
(tensor<(100,224,224,3)>, tensor<(100,)>) -> preprocesses images as tf.float32, uuids of the images as tf.string
If I extract the feature vector like this:
for image_data, uuids in ds.batch(100):
features = model.predict(data[0]) -> I get an array of features.
At this moment features is an array of (100, 2048) and uuids is a tensor of (100,) tf.string
How can I combine them in order to write the feature vectors to disk?
From my understanding, I need to have both of them in the same format, either both tensors so I can continue using tf code and save the feature vector as a tfrecord, either to get the uuid as a string from the uuid tensor so I can use python code and save the array in the file using numpy.tofile.
So my questions are:
- How can I make the features to be a tensor?
- Or can I get the string value from the tensor uuid?
- Does anything sounds wrong in what I try to do? Is there a more optimal way to create the input pipeline? Or did I misunderstood the usage of Keras API and tf dataset?
If I use a python pipeline I can successfully save the array in a file. But I would like to use tf dataset because I think it's gonna be faster and more optimized due to it's parallel map function, batching and autotuning the parallel calls.
I would like to know how to extract the same performance results from the events file of the output of a model as does Tensorboard : specifically the Precision, Recall, and Loss numbers are most of interest. Here is a subset of them displayed on Tensorboard given the model checkpoint directory:
I'm not sure if there self-documenting information or other metadata available for these models. This one in particular is the Faster RNN Inception: but are these outputs tied to a particular model or are they generic in format?
Found the approach in the tensorboard package:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator(evtf)
event_acc.Reload()
One of the entries is:
scal_losses = event_acc.Scalars('Loss/total_loss')
From that list we can extract such attributes as Step [number] and Value (of the loss):
losses = sorted([[sevt.step, sevt.value] for sevt in scal_losses])
Using TF 2.0 and tfp probability layers, I have constructed a keras.sequential model. I would like to export it for serving with TensorFlow Serving, and I would like to include the preprocessing and post processing steps in the servable.
My preprocessing steps are fairly simple-- fill NAs with explicit values, encoding a few strings as floats, normalize inputs, and denormalize outputs. For training, I have been doing the pre/post processing with pandas and numpy.
I know that I can export my Keras model's weights, wrap the keras.sequential model's architecture in a bigger TensorFlow graph, use low-level ops like tf.math.subtract(inputs, vector_of_feature_means) to do pre/post processing operations, define tf.placeholders for my inputs and outputs, and make a servable, but I feel like there has to be a cleaner way of doing this.
Is it possible to use keras.layers.Add() and keras.layers.Multiply() in a keras.sequence model for explicit preprocessing steps, or is there some more standard way of doing these things?
The standard and efficient way of doing these things, as per my understanding is, to use Tensorflow Transform. It doesn't essentially mean that we should use entire TFX Pipeline if we have to use TF Transform. TF Transform can be used as a Standalone as well.
Tensorflow Transform creates a Beam Transormation Graph, which injects these Transformations as Constants in Tensorflow Graph. As these transformations are represented as Constants in the Graph, they will be consistent across Training and Serving. Advantages of that consistency across Training and Serving are
Eliminates Training-Serving Skew
Eliminates the need for having code in the Serving System, which improves the latency.
Sample Code for TF Transform is mentioned below:
Code for Importing all the Dependencies:
try:
import tensorflow_transform as tft
import apache_beam as beam
except ImportError:
print('Installing TensorFlow Transform. This will take a minute, ignore the warnings')
!pip install -q tensorflow_transform
print('Installing Apache Beam. This will take a minute, ignore the warnings')
!pip install -q apache_beam
import tensorflow_transform as tft
import apache_beam as beam
import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema
Below mentioned is the Pre-Processing function where we mention all the Transformations. As of now, TF Transform doesn't provide a direct API for Missing Value Imputation. So, only for that, we have to write our own code for that using low level APIs.
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
# Since we are modifying some features and leaving others unchanged, we
# start by setting `outputs` to a copy of `inputs.
outputs = inputs.copy()
# Scale numeric columns to have range [0, 1].
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = tft.scale_to_0_1(outputs[key])
for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
# This is a SparseTensor because it is optional. Here we fill in a default
# value when it is missing.
dense = tf.sparse_to_dense(outputs[key].indices,
[outputs[key].dense_shape[0], 1],
outputs[key].values, default_value=0.)
# Reshaping from a batch of vectors of size 1 to a batch to scalars.
dense = tf.squeeze(dense, axis=1)
outputs[key] = tft.scale_to_0_1(dense)
# For all categorical columns except the label column, we generate a
# vocabulary but do not modify the feature. This vocabulary is instead
# used in the trainer, by means of a feature column, to convert the feature
# from a string to an integer id.
for key in CATEGORICAL_FEATURE_KEYS:
tft.vocabulary(inputs[key], vocab_filename=key)
# For the label column we provide the mapping from string to index.
table = tf.contrib.lookup.index_table_from_tensor(['>50K', '<=50K'])
outputs[LABEL_KEY] = table.lookup(outputs[LABEL_KEY])
return outputs
You can refer below mentioned link for the detailed information and for the Tutorial of TF Transform.
https://www.tensorflow.org/tfx/transform/get_started
https://www.tensorflow.org/tfx/tutorials/transform/census
I'm relatively new to TensorFlow and I'm having trouble modifying some of the examples to use batch/stream processing with input functions. More specifically, what is the 'best' way to modify this script to make it suitable for training and serving deployment on Google Cloud ML?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
Something akin to this example:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator/trainer
I can package it up and train it in the cloud, but I can't figure out how to apply even the simple vocab_processor transformations to an input tensor. I know how to do it with pandas, but there I can't apply the transformation to batches (using the chunk_size parameter). I would be very happy if I could reuse my pandas preprocessing pipelines in TensorFlow.
I think you have 3 options
1) You cannot reuse pandas preprocessing pipelines in TF. However, you could start TF with the output of your pandas preprocessing. So you could build a vocab and convert the text words to integers, and save a new preprocessed dataset to disk. Then read the integer data (which is encoding your text) in TF to do training.
2) You could build a vocab outside of TF in pandas. Then inside TF, after reading the words, you can make a table to map the text to integers. But if you are going to build a vocab outside of TF, you might as well do the transformation at the same time outside of TF, which is option 1.
3) Use tensorflow_transform. You can call tft.string_to_int() on the text column to automatically build the vocab and convert to integers. The output of tensorflow_transform is preprocessed data in tf.example format. Then training can start from the tf.example files. This is again option 1 but with tf.example files. If you want to run prediction on raw text data, this option allows you to make an exported graph that has the same text preprocessing built in, so you don't have to manage the preprocessing step at prediction time. However, this option is the most complicated as it introduces two additional ideas: tf.example files and beam pipelines.
For examples of tensorflow_transform see https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft
and
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/reddit_tft
I have string labels such as "cat", "dog". Can I feed string labels directly to deep learning models in Tensorflow and get string labels as predictions? I am looking for the equivalent of sklearn's labelEncoder sklearn.preprocessing import LabelEncoder
If this is not possible, is there a way to pack the labels into savedModel protobuf file and retrieve them based on indices during serving time? I am using Estimator's export_savedModel API. Is assets_extra the right way? The one at https://github.com/tensorflow/serving/issues/55 does not use savedModel format.
The typical way to handle label data in deep learning is to embed the labels in a vector space. Language models do it routinely with word embedding. TensorFlow provides embedding lookup operations that you can use for you purposes.