How to use the 'sphereize data' option with PCA in TensorFlow - tensorflow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/
I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?

The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.
Here is the code used in Tensorboard (in typescript):
normalize() {
// Compute the centroid of all data points.
let centroid = vector.centroid(this.points, (a) => a.vector);
if (centroid == null) {
throw Error('centroid should not be null');
}
// Shift all points by the centroid and make them unit norm.
for (let id = 0; id < this.points.length; ++id) {
let dataPoint = this.points[id];
dataPoint.vector = vector.sub(dataPoint.vector, centroid);
if (vector.norm2(dataPoint.vector) > 0) {
// If we take the unit norm of a vector of all 0s, we get a vector of
// all NaNs. We prevent that with a guard.
vector.unit(dataPoint.vector);
}
}
}
You can reproduce that normalization using the following python function:
def sphereize_data(x):
"""
x is a 2D Tensor of shape :(num_vectors, dim_vectors)
"""
centroids = tf.reduce_mean(x, axis=0, keepdims=True)
return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))

Related

Indexing in Rust ndarray crate based on a boolean mask

I would like to efficiently index into an ndarray using a boolean mask. To better convey what I mean I have some working numpy code and then my attempt in rust ndarray which works but is extremely inefficient.
Numpy:
import numpy as np
shape = (100, 100, 100)
grouping_array = np.random.randint(0, 100, size=shape)
data_array = np.random.rand(*shape)
for i in range(1, 100):
ith_mean = data_array[grouping_array == i].mean()
print(ith_mean)
Rust ndarray:
fn group_means(
data: &Array<f32, IxDyn>,
grouping_var: &Array<f32, IxDyn>,
n_groups: i32,
) {
for group in 1..n_groups {
let index_array = grouping_var.mapv(|x| x == roi as f32);
let roi_data = Array::from_iter(
image_data
.iter()
.zip(index_array.iter())
.map(|(x, y)| if *y { *x } else { 0. })
);
let mean_roi = roi_data.mean().unwrap();
println!("group {}; mean {}", group, mean_roi);
}
}
Here each iteration in the n_groups loop takes about as long as the whole numpy script which is done in less than a second. Is there a better way to do this in the rust-ndarray version?
This is likely not a surprise to others, but since my grouping_var array should (in my use case) always be 3D array, I changed its type (and therefore also index_array) from &Array<f32, IxDyn> to &Array<f32, Ix3> which dramatically improved performance.

TensorFlow - how to import data with multiple labels

I'm trying to create a model in TensorFlow which predicts ideal item for a user by predicting a vector of numbers.
I have created a dataset in Spark and saved it as a TFRecord using Spark TensorFlow connector.
In the dataset, I have several hundreds of features and 20 labels in each row. For easier manipulation, I have given every column a prefix 'feature_' or 'label_'.
Now I'm trying to write input function for TensorFlow, but I can't figure out how to parse the data.
So far I have written this:
def dataset_input_fn():
path = ['data.tfrecord']
dataset = tf.data.TFRecordDataset(path)
def parser(record):
example = tf.train.Example()
example.ParseFromString(record)
# TODO: no idea what to do here
# features = parsed["features"]
# label = parsed["label"]
# return features, label
dataset = dataset.map(parser)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(100)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
How can I split the Example into a feature set and a label set? I have tried to split the Example into two parts, but there is no way to even access it. The only way I have managed to access it is by printing the example out, which gives me something like this.
features {
...
feature {
key: "feature_wishlist_hour"
value {
int64_list {
value: 0
}
}
}
feature {
key: "label_emb_1"
value {
float_list {
value: 0.4
}
}
}
feature {
key: "label_emb_2"
value {
float_list {
value: 0.8
}
}
}
...
}
Your parser function should be similar to how you constructed the example proto. In your case its should be something similar to:
# example proto decode
def parser(example_proto):
keys_to_features = {'feature_wishlist_hour':tf.FixedLenFeature((), tf.int64),
'label_emb_1': tf.FixedLenFeature((), tf.float32),
'label_emb_2': tf.FixedLenFeature((), tf.float32)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return parsed_features['feature_wishlist_hour'], (parsed_features['label_emb_1'], parsed_features['label_emb_2'])
EDIT: From the comments it seems you are encoding each of the features as key, value pair, which is not right. Check this answer: Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords? on how to write it in a proper way.

How to import an saved Tensorflow model train using tf.estimator and predict on input data

I have save the model using tf.estimator .method export_savedmodel as follows:
export_dir="exportModel/"
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
classifier.export_savedmodel(export_dir, input_receiver_fn, as_text=False, checkpoint_path="Model/model.ckpt-400")
How can I import this saved model and use for predictions?
I tried to search for a good base example, but it appears the documentation and samples are a bit scattered for this topic. So let's start with a base example: the tf.estimator quickstart.
That particular example doesn't actually export a model, so let's do that (not need for use case 1):
def serving_input_receiver_fn():
"""Build the serving inputs."""
# The outer dimension (None) allows us to batch up inputs for
# efficiency. However, it also means that if we want a prediction
# for a single instance, we'll need to wrap it in an outer list.
inputs = {"x": tf.placeholder(shape=[None, 4], dtype=tf.float32)}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
export_dir = classifier.export_savedmodel(
export_dir_base="/path/to/model",
serving_input_receiver_fn=serving_input_receiver_fn)
Huge asterisk on this code: there appears to be a bug in TensorFlow 1.3 that doesn't allow you to do the above export on a "canned" estimator (such as DNNClassifier). For a workaround, see the "Appendix: Workaround" section.
The code below references export_dir (return value from the export step) to emphasize that it is not "/path/to/model", but rather, a subdirectory of that directory whose name is a timestamp.
Use Case 1: Perform prediction in the same process as training
This is an sci-kit learn type of experience, and is already exemplified by the sample. For completeness' sake, you simply call predict on the trained model:
classifier.train(input_fn=train_input_fn, steps=2000)
# [...snip...]
predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]
Use Case 2: Load a SavedModel into Python/Java/C++ and perform predictions
Python Client
Perhaps the easiest thing to use if you want to do prediction in Python is SavedModelPredictor. In the Python program that will use the SavedModel, we need code like this:
from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(export_dir)
predictions = predict_fn(
{"x": [[6.4, 3.2, 4.5, 1.5],
[5.8, 3.1, 5.0, 1.7]]})
print(predictions['scores'])
Java Client
package dummy;
import java.nio.FloatBuffer;
import java.util.Arrays;
import java.util.List;
import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
public class Client {
public static void main(String[] args) {
Session session = SavedModelBundle.load(args[0], "serve").session();
Tensor x =
Tensor.create(
new long[] {2, 4},
FloatBuffer.wrap(
new float[] {
6.4f, 3.2f, 4.5f, 1.5f,
5.8f, 3.1f, 5.0f, 1.7f
}));
// Doesn't look like Java has a good way to convert the
// input/output name ("x", "scores") to their underlying tensor,
// so we hard code them ("Placeholder:0", ...).
// You can inspect them on the command-line with saved_model_cli:
//
// $ saved_model_cli show --dir $EXPORT_DIR --tag_set serve --signature_def serving_default
final String xName = "Placeholder:0";
final String scoresName = "dnn/head/predictions/probabilities:0";
List<Tensor> outputs = session.runner()
.feed(xName, x)
.fetch(scoresName)
.run();
// Outer dimension is batch size; inner dimension is number of classes
float[][] scores = new float[2][3];
outputs.get(0).copyTo(scores);
System.out.println(Arrays.deepToString(scores));
}
}
C++ Client
You'll likely want to use tensorflow::LoadSavedModel with Session.
#include <unordered_set>
#include <utility>
#include <vector>
#include "tensorflow/cc/saved_model/loader.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/public/session.h"
namespace tf = tensorflow;
int main(int argc, char** argv) {
const string export_dir = argv[1];
tf::SavedModelBundle bundle;
tf::Status load_status = tf::LoadSavedModel(
tf::SessionOptions(), tf::RunOptions(), export_dir, {"serve"}, &bundle);
if (!load_status.ok()) {
std::cout << "Error loading model: " << load_status << std::endl;
return -1;
}
// We should get the signature out of MetaGraphDef, but that's a bit
// involved. We'll take a shortcut like we did in the Java example.
const string x_name = "Placeholder:0";
const string scores_name = "dnn/head/predictions/probabilities:0";
auto x = tf::Tensor(tf::DT_FLOAT, tf::TensorShape({2, 4}));
auto matrix = x.matrix<float>();
matrix(0, 0) = 6.4;
matrix(0, 1) = 3.2;
matrix(0, 2) = 4.5;
matrix(0, 3) = 1.5;
matrix(0, 1) = 5.8;
matrix(0, 2) = 3.1;
matrix(0, 3) = 5.0;
matrix(0, 4) = 1.7;
std::vector<std::pair<string, tf::Tensor>> inputs = {{x_name, x}};
std::vector<tf::Tensor> outputs;
tf::Status run_status =
bundle.session->Run(inputs, {scores_name}, {}, &outputs);
if (!run_status.ok()) {
cout << "Error running session: " << run_status << std::endl;
return -1;
}
for (const auto& tensor : outputs) {
std::cout << tensor.matrix<float>() << std::endl;
}
}
Use Case 3: Serve a model using TensorFlow Serving
Exporting models in a manner amenable to serving a Classification model requires that the input be a tf.Example object. Here's how we might export a model for TensorFlow serving:
def serving_input_receiver_fn():
"""Build the serving inputs."""
# The outer dimension (None) allows us to batch up inputs for
# efficiency. However, it also means that if we want a prediction
# for a single instance, we'll need to wrap it in an outer list.
example_bytestring = tf.placeholder(
shape=[None],
dtype=tf.string,
)
features = tf.parse_example(
example_bytestring,
tf.feature_column.make_parse_example_spec(feature_columns)
)
return tf.estimator.export.ServingInputReceiver(
features, {'examples': example_bytestring})
export_dir = classifier.export_savedmodel(
export_dir_base="/path/to/model",
serving_input_receiver_fn=serving_input_receiver_fn)
The reader is referred to TensorFlow Serving's documentation for more instructions on how to setup TensorFlow Serving, so I'll only provide the client code here:
# Omitting a bunch of connection/initialization code...
# But at some point we end up with a stub whose lifecycle
# is generally longer than that of a single request.
stub = create_stub(...)
# The actual values for prediction. We have two examples in this
# case, each consisting of a single, multi-dimensional feature `x`.
# This data here is the equivalent of the map passed to the
# `predict_fn` in use case #2.
examples = [
tf.train.Example(
features=tf.train.Features(
feature={"x": tf.train.Feature(
float_list=tf.train.FloatList(value=[6.4, 3.2, 4.5, 1.5]))})),
tf.train.Example(
features=tf.train.Features(
feature={"x": tf.train.Feature(
float_list=tf.train.FloatList(value=[5.8, 3.1, 5.0, 1.7]))})),
]
# Build the RPC request.
predict_request = predict_pb2.PredictRequest()
predict_request.model_spec.name = "default"
predict_request.inputs["examples"].CopyFrom(
tensor_util.make_tensor_proto(examples, tf.float32))
# Perform the actual prediction.
stub.Predict(request, PREDICT_DEADLINE_SECS)
Note that the key, examples, that is referenced in the predict_request.inputs needs to match the key used in the serving_input_receiver_fn at export time (cf. the constructor to ServingInputReceiver in that code).
Appendix: Working around Exports from Canned Models in TF 1.3
There appears to be a bug in TensorFlow 1.3 in which canned models do not export properly for use case 2 (the problem does not exist for "custom" estimators). Here's is a workaround that wraps a DNNClassifier to make things work, specifically for the Iris example:
# Build 3 layer DNN with 10, 20, 10 units respectively.
class Wrapper(tf.estimator.Estimator):
def __init__(self, **kwargs):
dnn = tf.estimator.DNNClassifier(**kwargs)
def model_fn(mode, features, labels):
spec = dnn._call_model_fn(features, labels, mode)
export_outputs = None
if spec.export_outputs:
export_outputs = {
"serving_default": tf.estimator.export.PredictOutput(
{"scores": spec.export_outputs["serving_default"].scores,
"classes": spec.export_outputs["serving_default"].classes})}
# Replace the 3rd argument (export_outputs)
copy = list(spec)
copy[4] = export_outputs
return tf.estimator.EstimatorSpec(mode, *copy)
super(Wrapper, self).__init__(model_fn, kwargs["model_dir"], dnn.config)
classifier = Wrapper(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
I dont think there is a bug with canned Estimators (or rather if there was ever one, it has been fixed). I was able to successfully export a canned estimator model using Python and import it in Java.
Here is my code to export the model:
a = tf.feature_column.numeric_column("a");
b = tf.feature_column.numeric_column("b");
feature_columns = [a, b];
model = tf.estimator.DNNClassifier(feature_columns=feature_columns ...);
# To export
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns);
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec);
servable_model_path = model.export_savedmodel(servable_model_dir, export_input_fn, as_text=True);
To import the model in Java, I used the Java client code provided by rhaertel80 above and it works. Hope this also answers Ben Fowler's question above.
It appears that the TensorFlow team does not agree that there is a bug in version 1.3 using canned estimators for exporting a model under use case #2. I submitted a bug report here:
https://github.com/tensorflow/tensorflow/issues/13477
The response I received from TensorFlow is that the input must only be a single string tensor. It appears that there may be a way to consolidate multiple features into a single string tensor using serialized TF.examples, but I have not found a clear method to do this. If anyone has code showing how to do this, I would be appreciative.
You need to export the saved model using tf.contrib.export_savedmodel and you need to define input receiver function to pass input to.
Later you can load the saved model ( generally saved.model.pb) from the disk and serve it.
TensorFlow: How to predict from a SavedModel?

PCoA (Principal *Coordinate* Analysis) in Accord.net

I've been trying to use PCA (Principal Component Analysis) in Accord.net but am not getting the correct results for PCoA.
Is there a way to achieve this without writing the algo myself?
var pca = new PrincipalComponentAnalysis()
{
Method = PrincipalComponentMethod.Standardize,
Whiten = true
};
MultivariateLinearRegression transform = pca.Learn(distances);
pca.NumberOfOutputs = 2;
double[][] output = pca.Transform(distances);
note that the "distances" matrix is a NxN 1-correlation matrix of N time-series I get as an input.

Tensorflow: how to add user custom op accepting two 1D vec tensor and output a scalar?

I'm trying below but not work.
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
using namespace tensorflow;
REGISTER_OP("Auc")
.Input("predicts: T1")
.Input("labels: T2")
.Output("z: double")
.Attr("T1: {float, double}")
.Attr("T2: {int32, int64}")
.SetIsCommutative()
.Doc(R"doc(
Given preidicts and labels output it's auc
)doc");
class AucOp : public OpKernel {
public:
explicit AucOp(OpKernelConstruction* context) : OpKernel(context) {}
void Compute(OpKernelContext* context) override {
// Grab the input tensor
const Tensor& predicts_tensor = context->input(0);
const Tensor& labels_tensor = context->input(1);
auto predicts = predicts_tensor.flat<double>();
auto labels = labels_tensor.flat<int32>();
// Create an output tensor
Tensor* output_tensor = NULL;
TensorShape output_shape;
OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output_tensor));
output_tensor->flat<double>().setConstant(predicts(0) * labels(0));
}
};
REGISTER_KERNEL_BUILDER(Name("Auc").Device(DEVICE_CPU), AucOp);
test.py
predicts = tf.constant([0.8, 0.5, 0.12])
labels = tf.constant([-1, 1, 1])
output = tf.user_ops.auc(predicts, labels)
with tf.Session() as sess:
init = tf.initialize_all_variables()
sess.run(init)
print output.eval()
./test.py
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 8
I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 8
F ./tensorflow/core/public/tensor.h:453] Check failed: dtype() == DataTypeToEnum::v() (1 vs. 2)
Aborted
The issue is that the predicts tensor in your Python program has type float, and your op registration accepts this as a valid type for the predicts input (since T1 can be float or double), but AucOp::Compute() assumes that the predicts input always has type double (in the call to predicts_tensor.flat<double>()). The tensorflow::Tensor class does not convert the type of elements in the tensor when you ask for values of a different type, and instead it raises a fatal error.
There are several possible solutions:
To get things working quickly, you could change the type of predicts in your Python program to tf.float64 (which is a synonym for double in the Python front-end):
predicts = tf.constant([0.8, 0.5, 0.12], dtype=tf.float64)
You could start by defining a simpler op that accepts inputs of a single type only:
REGISTER_OP("Auc")
.Input("predicts: double")
.Input("labels: int32")
...;
You could add code in the AucOp::Compute() method to test the input type and access the input values as appropriate. (Use this->input_type(i) to find the type of the ith input.
You could define a templated class AucOp<TPredict, TLabel>, then use TypeConstraint<> in the REGISTER_KERNEL_BUILDER call to define specializations for each of the four valid combinations of prediction and label types. This would look something like:
REGISTER_KERNEL_BUILDER(Name("Auc")
.Device(DEVICE_CPU)
.TypeConstraint<float>("T1")
.TypeConstraint<int32>("T2"),
AucOp<float, int32>);
// etc. for AucOp<double, int32>, AucOp<float, int64>, and AucOp<double, int64>.