Multi-column input to ML.PREDICT for a TensorFlow model in BigQuery ML - tensorflow

I trained a TensorFlow classifier and created it as a model in BigQuery ML using CREATE MODEL. Now I would like to use ML.PREDICT to batch predict using this model. I get the error "Invalid table-valued function ml.predict Column inputs is not found in the input data to the PREDICT function."
Here's my query:
select * from ml.predict (
model test.digital_native_classifier_kf,
(select * from dataset_id.features_table_id)
)
In the BigQuery documentation, they give an example for a TensorFlow model with a single column aliased as input so the TensorFlow input_fn can accept it. However, this classifier accepts hundreds of features. How do I specify the query passed to ML.PREDICT so it uses all the columns in my features table?

After you load the model into BigQuery ML, click on the model in the BigQuery UI and switch over to the "Schema" tab. This should tell you what features (column names) the model wants.
It is possible that when you created the TensorFlow/Keras model you did not assign names to the input nodes. Then, the feature names might have been auto-assigned to something like int1 and float2.
Alternately, run the program saved_model_cli on the model (it's a python program that comes with tensorflow) to see what the supported signature is
saved_model_cli show --dir $export_path --all

After some research, Auto ML encodes the input Tensors as Prensors which is a serialized string format shoved into a Tensor.
This means that you can't import the AutoML model from GCS directly into BQML the way you would import a TensorFlow model that explicitly encoded the different inputs as a json struct.
So, in order to import an AutoML model into BigQuery ML, the BigQuery engineering team would need to add support for something like model_type='automl' in addition to model_type='tensorflow'.

At the moment multi-column is not possible, from the AutoML Beginners guide:
One column from your dataset, called the target, is what your model will learn to predict. Some number of the other data columns are inputs (called features) that the model will learn patterns from. You can use the same input features to build multiple kinds of models just by changing the target.
Also found this feature request to Multi-target AutoML Tables Request

Related

Multi-column input to ML.PREDICT for a TensorFlow model in BigQueryML

We have trained a model in Google Cloud AutoML (a tool that we like a lot) and successfully exported it to GCS, and then created the model in BigQuery using the below command:
create or replace model my_dataset.my_bq_ml_model
options(model_type='tensorflow',
model_path='my gcs path to exported tensorflow model'))
However, when we use BigQueryML to try and run some predictions using the model we are unsure of the how to format the multiple features that our model uses into the single "inputs" string the exported Tensorflow model accepts in BigQuery.
select *
from ml.predict(model my_project.my_dataset.my_bq_ml_model,
(
select 'How do we format this?' as inputs
from my_rows_to_predict
))
Has anyone done this yet?
This is similar to this question, which remains open:
Multi-column input to ML.PREDICT for a TensorFlow model in BigQuery ML
Thank you all.
After you load the model into BigQuery ML, click on the model in the BigQuery UI and switch over to the "Schema" tab. This should tell you what columns the model wants.
Alternately, run the program saved_model_cli on the model (it's a python program that comes with tensorflow) to see what the supported signature is
saved_model_cli show --dir $export_path --all

How to parse the tensorflow events file?

I would like to know how to extract the same performance results from the events file of the output of a model as does Tensorboard : specifically the Precision, Recall, and Loss numbers are most of interest. Here is a subset of them displayed on Tensorboard given the model checkpoint directory:
I'm not sure if there self-documenting information or other metadata available for these models. This one in particular is the Faster RNN Inception: but are these outputs tied to a particular model or are they generic in format?
Found the approach in the tensorboard package:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator(evtf)
event_acc.Reload()
One of the entries is:
scal_losses = event_acc.Scalars('Loss/total_loss')
From that list we can extract such attributes as Step [number] and Value (of the loss):
losses = sorted([[sevt.step, sevt.value] for sevt in scal_losses])

How to convert a saved_model.pb to EvalSavedModel?

I was going through the tensorflow-model-analysis documentation evaluating TensorFlow models. The getting started guide talks about a special SavedModel called the EvalSavedModel.
Quoting the getting started guide:
This EvalSavedModel contains additional information which allows TFMA
to compute the same evaluation metrics defined in your model in a
distributed manner over a large amount of data, and user-defined
slices.
My question is how can I convert an already existing saved_model.pb to an EvalSavedModel?
EvalSavedModel is exported as SavedModel message, thus there is no need in such conversion.
EvalSavedModel uses SavedModelBuilder under the hood. It populates the estimator graph with several placeholders, creates some additional metric collections. Later on, it performs simple SavedModelBuilder procedure.
Source - https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/eval_saved_model/export.py#L228
P.S. I suppose you want to run model-analysis on your model, exported by SavedModelBuilder. Since SavedModel doesn't have neither metric nodes nor related collections, which are created in EvalSavedModel, it's useless to do so - model-analysis just simply couldn't find any metric related to your estimator.
If I understand your question correctly, you have saved_model.pb generated, either by using tf.saved_model.simple_save or tf.saved_model.builder.SavedModelBuilderor by estimator.export_savedmodel.
If my understanding is correct, then, you are exporting Training and Inference Graphs to saved_model.pb.
The Point you mentioned from the Guide of TF Org Website states that, in addition to Exporting Training Graph, we need to Export Evaluation Graph as well. That is called EvalSavedModel.
The Evaluation Graph comprises the Metrics for that Model, so that you can Evaluate the Model's performance using Visualizations.
Before we Export EvalSaved Model, we should prepare eval_input_receiver_fn, similar to serving_input_receiver_fn.
We can mention other functionalities as well, like, if you want the Metrics to be defined in a Distributed Manner or if we want to Evaluate our Model using Slices of Data, rather than the Entire Dataset. Such Options can be mentioned in eval_input_receiver_fn.
Then we can Export the EvalSavedModel using the Code below:
tfma.export.export_eval_savedmodel(estimator=estimator,export_dir_base=export_dir,
eval_input_receiver_fn=eval_input_receiver_fn)

Training model with ID column

I am working on training a model with scikit-learn where I have an ID column in my dataset. I remove the ID column when I train the model.But with the test dataset,I need to map it back to the ID column after I do the prediction.
What is the best way to do this? We can set a non predictor column when building a model in scikit-learn? Also, what about the other ML tools like TensorFlow,Spark ML in general. Do they support this feature?
I found this post on stackoverflow but was looking out for other options.
I assume you store your data (X) in a pd.DataFrame.
If that is the case, simply extract the values into a numpy ndarray. The corresponding rows will have the same order. A scikit-learn stylized example:
output = pd.Series(data=some_model.predict(X.values), index=X.index)

What are the input/output tensors, for the translation(RNN) tutorial?

As per Unable to deploy a Cloud ML model if I want to deploy my model to the Google Cloud ML I need explicitly set the "input"/"output" collections that will store the references to the input/output tensors, like this:
This collection should name all the input tensors for your graph.
Similarly, a collection named “outputs” is required to name the output
tensors for your graph. Assuming your graph has two input tensors x
and y, and one output tensor scores, this can be done as follows:
tf.add_to_collection(“inputs”, json.dumps({“x” : x.name, “y”:
y.name}))
tf.add_to_collection(“outputs”, json.dumps({“scores”:
scores.name}))
Here “x”, “y” and “scores” become aliases to the actual
tensor names (x.name, y.name and scores.name)
However, I do not know what are the input/output tensors in the translation(RNN) tutorial. Without this knowledge, I can't refactor the code and deploy my models to the Google Cloud ML.
According to the code below, the inputs are: encoder_inputs, decoder_inputs, target_weights, and the output is the third element of the output of the return value of step()
https://github.com/petewarden/tensorflow_makefile/blob/master/tensorflow/models/rnn/translate/seq2seq_model.py#L170