How to pass label to matrix of XGBoost JVM Package - xgboost

I found the JVM package quick start example of creating data matrix at official doc
float[] data = new float[] {1f,2f,3f,4f,5f,6f};
int nrow = 3;
int ncol = 2;
float missing = 0.0f;
DMatrix dmat = new DMatrix(data, nrow, ncol, missing);
// and a DMatrix could be called by train method
Booster booster = XGBoost.train(dmat, params, nround, watches, null, null);
However, the doc says the JVM package DMatrix usage is the same with Python package, in Python, there is a label parameter
dtrain = xgb.DMatrix(data, label=label)
But I could not find any in JVM package, how to set the label?

In XGBoost JVM package we could use dMatrix.setLabel(labelArray) to set the label (According to XGBoost github issue page)

Related

How to use the 'sphereize data' option with PCA in TensorFlow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/
I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?
The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.
Here is the code used in Tensorboard (in typescript):
normalize() {
// Compute the centroid of all data points.
let centroid = vector.centroid(this.points, (a) => a.vector);
if (centroid == null) {
throw Error('centroid should not be null');
}
// Shift all points by the centroid and make them unit norm.
for (let id = 0; id < this.points.length; ++id) {
let dataPoint = this.points[id];
dataPoint.vector = vector.sub(dataPoint.vector, centroid);
if (vector.norm2(dataPoint.vector) > 0) {
// If we take the unit norm of a vector of all 0s, we get a vector of
// all NaNs. We prevent that with a guard.
vector.unit(dataPoint.vector);
}
}
}
You can reproduce that normalization using the following python function:
def sphereize_data(x):
"""
x is a 2D Tensor of shape :(num_vectors, dim_vectors)
"""
centroids = tf.reduce_mean(x, axis=0, keepdims=True)
return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))

Tensorflow Lite Model: Incompatible shapes for input and output array

I'm currently working on a Tensorflow Lite image classifier app that can recognice UNO cards. But when I'm running the float model in the class ImageClassifier, something is wrong.
The error is the next one:
java.lang.IllegalArgumentException: Cannot copy from a TensorFlowLite tensor (Identity) with shape [1, 10647, 4] to a Java object with shape [1, 15].
Here's the code that throw that error:
tflite.run(imgData, labelProbArray);
And this is how I have created imgData and labelProbArray:
private static final int DIM_BATCH_SIZE = 1;
private static final int DIM_PIXEL_SIZE = 3; //r+g+b = 1+1+1
static final int DIM_IMG_SIZE_X = 416;
static final int DIM_IMG_SIZE_Y = 416;
imgData = ByteBuffer.allocateDirect(DIM_BATCH_SIZE * DIM_IMG_SIZE_X * DIM_IMG_SIZE_Y * DIM_PIXEL_SIZE * 4); //The last value because size of float is 4
labelProbArray = new float[1][labelList.size()]; // {1, 15}
You can chech the inputs and ouputs of the .tflite file. Source.
I know you should create a buffer for the output values, but I tried to import this and didn't work:
import org.tensorflow.lite.support.tensorbuffer.TensorBuffer;
Any ideas? Thank you so much for read me^^
Edit v2:
Thanks to yyoon I realise that I didn't populate my model with metadata, so I run this line in my cmd:
python ./metadata_writer_for_image_classifier_uno.py \ --model_file=./model_without_metadata/custom.tflite \ --label_file=./model_without_metadata/labels.txt \ --export_directory=model_with_metadata
Before that, I modified this file with my data:
_MODEL_INFO = {
"custom.tflite":
ModelSpecificInfo(
name="UNO image classifier",
version="v1",
image_width=416,
image_height=416,
image_min=0,
image_max=255,
mean=[127.5],
std=[127.5],
num_classes=15)
}
And another error appeared:
ValueError: The number of output tensors (2) should match the number of output tensor metadata (1)
Idk why my model have 2 tensors outputs...

How to read parameters of layers of .tflite model in python

I was trying to read tflite model and pull all the parameters of the layers out.
My steps:
I generated flatbuffers model representation by running (please build flatc before):
flatc -python tensorflow/tensorflow/lite/schema/schema.fbs
Result is tflite/ folder that contains layer description files (*.py) and some utilitarian files.
I successfully loaded model:
in case of import Error: set PYTHONPATH to point to the folder where tflite/ is
from tflite.Model import Model
def read_tflite_model(file):
buf = open(file, "rb").read()
buf = bytearray(buf)
model = Model.GetRootAsModel(buf, 0)
return model
I partly pulled model and node parameters out and stacked in iterating over nodes:
Model part:
def print_model_info(model):
version = model.Version()
print("Model version:", version)
description = model.Description().decode('utf-8')
print("Description:", description)
subgraph_len = model.SubgraphsLength()
print("Subgraph length:", subgraph_len)
Nodes part:
def print_nodes_info(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
operators_len = subgraph.OperatorsLength()
print('Operators length:', operators_len)
from collections import deque
nodes = deque(subgraph.InputsAsNumpy())
STEP_N = 0
MAX_STEPS = operators_len
print("Nodes info:")
while len(nodes) != 0 and STEP_N <= MAX_STEPS:
print("MAX_STEPS={} STEP_N={}".format(MAX_STEPS, STEP_N))
print("-" * 60)
node_id = nodes.pop()
print("Node id:", node_id)
tensor = subgraph.Tensors(node_id)
print("Node name:", tensor.Name().decode('utf-8'))
print("Node shape:", tensor.ShapeAsNumpy())
# which type is it? what does it mean?
type_of_tensor = tensor.Type()
print("Tensor type:", type_of_tensor)
quantization = tensor.Quantization()
min = quantization.MinAsNumpy()
max = quantization.MaxAsNumpy()
scale = quantization.ScaleAsNumpy()
zero_point = quantization.ZeroPointAsNumpy()
print("Quantization: ({}, {}), s={}, z={}".format(min, max, scale, zero_point))
# I do not understand it again. what is j, that I set to 0 here?
operator = subgraph.Operators(0)
for i in operator.OutputsAsNumpy():
nodes.appendleft(i)
STEP_N += 1
print("-"*60)
Please point me to documentation or some example of using this API.
My problems are:
I can not get documentation on this API
Iterating over Tensor objects seems not possible for me, as it doesn't have Inputs and Outputs methods. + subgraph.Operators(j=0) I do not understand what j means in here. Because of that my cycle goes through two nodes: input (once) and the next one over and over again.
Iterating over Operator objects is surely possible:
Here we iterate over them all but I can not get how to map Operator and Tensor.
def print_in_out_info_of_all_operators(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
for i in range(subgraph.OperatorsLength()):
operator = subgraph.Operators(i)
print('Outputs', operator.OutputsAsNumpy())
print('Inputs', operator.InputsAsNumpy())
I do not understand how to pull parameters out Operator object. BuiltinOptions method gives me Table object, that I do not know what to map at.
subgraph = model.Subgraphs(0)
What does this 0 mean? should it always be zero? obviously no, but what is it? Id of the subgraph? If so - I'm happy. If no, please try to explain it.

Generate covariance matrices in batch (TensorFlow)

I wonder how to generate covariance matrices in batch in TensorFlow. If we use the following code
dim = 3
df = 5
ds = tf.contrib.distributions
scale_sqrt = tf.random_normal([dim, dim], seed=1234)
scale = tf.matmul(scale_sqrt, tf.transpose(scale_sqrt))
sigma = ds.WishartCholesky(df=df, scale=scale).sample()
it would work. But if we try the batch version of this code by adding an additional batch dimension then TF would throw an error. My batch version looks as follows:
dim = 3
df = 5
ds = tf.contrib.distributions
num_per_batch = 10
scale_sqrt = tf.random_normal([num_per_batch, dim, dim], seed=1234)
scale = tf.matmul(scale_sqrt, tf.transpose(scale_sqrt, [0,2,1]))
sigma = ds.WishartCholesky(df=df, scale=scale).sample()
Please let me know how to sample in batch efficiently.
For posteriority, this bug has been fixed, and should be available in the tf-nightly versions (when I call sess.run(...) on sigma, I get back values with no errors).

How to import an saved Tensorflow model train using tf.estimator and predict on input data

I have save the model using tf.estimator .method export_savedmodel as follows:
export_dir="exportModel/"
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
classifier.export_savedmodel(export_dir, input_receiver_fn, as_text=False, checkpoint_path="Model/model.ckpt-400")
How can I import this saved model and use for predictions?
I tried to search for a good base example, but it appears the documentation and samples are a bit scattered for this topic. So let's start with a base example: the tf.estimator quickstart.
That particular example doesn't actually export a model, so let's do that (not need for use case 1):
def serving_input_receiver_fn():
"""Build the serving inputs."""
# The outer dimension (None) allows us to batch up inputs for
# efficiency. However, it also means that if we want a prediction
# for a single instance, we'll need to wrap it in an outer list.
inputs = {"x": tf.placeholder(shape=[None, 4], dtype=tf.float32)}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
export_dir = classifier.export_savedmodel(
export_dir_base="/path/to/model",
serving_input_receiver_fn=serving_input_receiver_fn)
Huge asterisk on this code: there appears to be a bug in TensorFlow 1.3 that doesn't allow you to do the above export on a "canned" estimator (such as DNNClassifier). For a workaround, see the "Appendix: Workaround" section.
The code below references export_dir (return value from the export step) to emphasize that it is not "/path/to/model", but rather, a subdirectory of that directory whose name is a timestamp.
Use Case 1: Perform prediction in the same process as training
This is an sci-kit learn type of experience, and is already exemplified by the sample. For completeness' sake, you simply call predict on the trained model:
classifier.train(input_fn=train_input_fn, steps=2000)
# [...snip...]
predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]
Use Case 2: Load a SavedModel into Python/Java/C++ and perform predictions
Python Client
Perhaps the easiest thing to use if you want to do prediction in Python is SavedModelPredictor. In the Python program that will use the SavedModel, we need code like this:
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(export_dir)
predictions = predict_fn(
{"x": [[6.4, 3.2, 4.5, 1.5],
[5.8, 3.1, 5.0, 1.7]]})
print(predictions['scores'])
Java Client
package dummy;
import java.nio.FloatBuffer;
import java.util.Arrays;
import java.util.List;
import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
public class Client {
public static void main(String[] args) {
Session session = SavedModelBundle.load(args[0], "serve").session();
Tensor x =
Tensor.create(
new long[] {2, 4},
FloatBuffer.wrap(
new float[] {
6.4f, 3.2f, 4.5f, 1.5f,
5.8f, 3.1f, 5.0f, 1.7f
}));
// Doesn't look like Java has a good way to convert the
// input/output name ("x", "scores") to their underlying tensor,
// so we hard code them ("Placeholder:0", ...).
// You can inspect them on the command-line with saved_model_cli:
//
// $ saved_model_cli show --dir $EXPORT_DIR --tag_set serve --signature_def serving_default
final String xName = "Placeholder:0";
final String scoresName = "dnn/head/predictions/probabilities:0";
List<Tensor> outputs = session.runner()
.feed(xName, x)
.fetch(scoresName)
.run();
// Outer dimension is batch size; inner dimension is number of classes
float[][] scores = new float[2][3];
outputs.get(0).copyTo(scores);
System.out.println(Arrays.deepToString(scores));
}
}
C++ Client
You'll likely want to use tensorflow::LoadSavedModel with Session.
#include <unordered_set>
#include <utility>
#include <vector>
#include "tensorflow/cc/saved_model/loader.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/public/session.h"
namespace tf = tensorflow;
int main(int argc, char** argv) {
const string export_dir = argv[1];
tf::SavedModelBundle bundle;
tf::Status load_status = tf::LoadSavedModel(
tf::SessionOptions(), tf::RunOptions(), export_dir, {"serve"}, &bundle);
if (!load_status.ok()) {
std::cout << "Error loading model: " << load_status << std::endl;
return -1;
}
// We should get the signature out of MetaGraphDef, but that's a bit
// involved. We'll take a shortcut like we did in the Java example.
const string x_name = "Placeholder:0";
const string scores_name = "dnn/head/predictions/probabilities:0";
auto x = tf::Tensor(tf::DT_FLOAT, tf::TensorShape({2, 4}));
auto matrix = x.matrix<float>();
matrix(0, 0) = 6.4;
matrix(0, 1) = 3.2;
matrix(0, 2) = 4.5;
matrix(0, 3) = 1.5;
matrix(0, 1) = 5.8;
matrix(0, 2) = 3.1;
matrix(0, 3) = 5.0;
matrix(0, 4) = 1.7;
std::vector<std::pair<string, tf::Tensor>> inputs = {{x_name, x}};
std::vector<tf::Tensor> outputs;
tf::Status run_status =
bundle.session->Run(inputs, {scores_name}, {}, &outputs);
if (!run_status.ok()) {
cout << "Error running session: " << run_status << std::endl;
return -1;
}
for (const auto& tensor : outputs) {
std::cout << tensor.matrix<float>() << std::endl;
}
}
Use Case 3: Serve a model using TensorFlow Serving
Exporting models in a manner amenable to serving a Classification model requires that the input be a tf.Example object. Here's how we might export a model for TensorFlow serving:
def serving_input_receiver_fn():
"""Build the serving inputs."""
# The outer dimension (None) allows us to batch up inputs for
# efficiency. However, it also means that if we want a prediction
# for a single instance, we'll need to wrap it in an outer list.
example_bytestring = tf.placeholder(
shape=[None],
dtype=tf.string,
)
features = tf.parse_example(
example_bytestring,
tf.feature_column.make_parse_example_spec(feature_columns)
)
return tf.estimator.export.ServingInputReceiver(
features, {'examples': example_bytestring})
export_dir = classifier.export_savedmodel(
export_dir_base="/path/to/model",
serving_input_receiver_fn=serving_input_receiver_fn)
The reader is referred to TensorFlow Serving's documentation for more instructions on how to setup TensorFlow Serving, so I'll only provide the client code here:
# Omitting a bunch of connection/initialization code...
# But at some point we end up with a stub whose lifecycle
# is generally longer than that of a single request.
stub = create_stub(...)
# The actual values for prediction. We have two examples in this
# case, each consisting of a single, multi-dimensional feature `x`.
# This data here is the equivalent of the map passed to the
# `predict_fn` in use case #2.
examples = [
tf.train.Example(
features=tf.train.Features(
feature={"x": tf.train.Feature(
float_list=tf.train.FloatList(value=[6.4, 3.2, 4.5, 1.5]))})),
tf.train.Example(
features=tf.train.Features(
feature={"x": tf.train.Feature(
float_list=tf.train.FloatList(value=[5.8, 3.1, 5.0, 1.7]))})),
]
# Build the RPC request.
predict_request = predict_pb2.PredictRequest()
predict_request.model_spec.name = "default"
predict_request.inputs["examples"].CopyFrom(
tensor_util.make_tensor_proto(examples, tf.float32))
# Perform the actual prediction.
stub.Predict(request, PREDICT_DEADLINE_SECS)
Note that the key, examples, that is referenced in the predict_request.inputs needs to match the key used in the serving_input_receiver_fn at export time (cf. the constructor to ServingInputReceiver in that code).
Appendix: Working around Exports from Canned Models in TF 1.3
There appears to be a bug in TensorFlow 1.3 in which canned models do not export properly for use case 2 (the problem does not exist for "custom" estimators). Here's is a workaround that wraps a DNNClassifier to make things work, specifically for the Iris example:
# Build 3 layer DNN with 10, 20, 10 units respectively.
class Wrapper(tf.estimator.Estimator):
def __init__(self, **kwargs):
dnn = tf.estimator.DNNClassifier(**kwargs)
def model_fn(mode, features, labels):
spec = dnn._call_model_fn(features, labels, mode)
export_outputs = None
if spec.export_outputs:
export_outputs = {
"serving_default": tf.estimator.export.PredictOutput(
{"scores": spec.export_outputs["serving_default"].scores,
"classes": spec.export_outputs["serving_default"].classes})}
# Replace the 3rd argument (export_outputs)
copy = list(spec)
copy[4] = export_outputs
return tf.estimator.EstimatorSpec(mode, *copy)
super(Wrapper, self).__init__(model_fn, kwargs["model_dir"], dnn.config)
classifier = Wrapper(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
I dont think there is a bug with canned Estimators (or rather if there was ever one, it has been fixed). I was able to successfully export a canned estimator model using Python and import it in Java.
Here is my code to export the model:
a = tf.feature_column.numeric_column("a");
b = tf.feature_column.numeric_column("b");
feature_columns = [a, b];
model = tf.estimator.DNNClassifier(feature_columns=feature_columns ...);
# To export
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns);
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec);
servable_model_path = model.export_savedmodel(servable_model_dir, export_input_fn, as_text=True);
To import the model in Java, I used the Java client code provided by rhaertel80 above and it works. Hope this also answers Ben Fowler's question above.
It appears that the TensorFlow team does not agree that there is a bug in version 1.3 using canned estimators for exporting a model under use case #2. I submitted a bug report here:
https://github.com/tensorflow/tensorflow/issues/13477
The response I received from TensorFlow is that the input must only be a single string tensor. It appears that there may be a way to consolidate multiple features into a single string tensor using serialized TF.examples, but I have not found a clear method to do this. If anyone has code showing how to do this, I would be appreciative.
You need to export the saved model using tf.contrib.export_savedmodel and you need to define input receiver function to pass input to.
Later you can load the saved model ( generally saved.model.pb) from the disk and serve it.
TensorFlow: How to predict from a SavedModel?