How to parse the tensorflow events file? - tensorflow

I would like to know how to extract the same performance results from the events file of the output of a model as does Tensorboard : specifically the Precision, Recall, and Loss numbers are most of interest. Here is a subset of them displayed on Tensorboard given the model checkpoint directory:
I'm not sure if there self-documenting information or other metadata available for these models. This one in particular is the Faster RNN Inception: but are these outputs tied to a particular model or are they generic in format?

Found the approach in the tensorboard package:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator(evtf)
event_acc.Reload()
One of the entries is:
scal_losses = event_acc.Scalars('Loss/total_loss')
From that list we can extract such attributes as Step [number] and Value (of the loss):
losses = sorted([[sevt.step, sevt.value] for sevt in scal_losses])

Related

Multi-column input to ML.PREDICT for a TensorFlow model in BigQuery ML

I trained a TensorFlow classifier and created it as a model in BigQuery ML using CREATE MODEL. Now I would like to use ML.PREDICT to batch predict using this model. I get the error "Invalid table-valued function ml.predict Column inputs is not found in the input data to the PREDICT function."
Here's my query:
select * from ml.predict (
model test.digital_native_classifier_kf,
(select * from dataset_id.features_table_id)
)
In the BigQuery documentation, they give an example for a TensorFlow model with a single column aliased as input so the TensorFlow input_fn can accept it. However, this classifier accepts hundreds of features. How do I specify the query passed to ML.PREDICT so it uses all the columns in my features table?
After you load the model into BigQuery ML, click on the model in the BigQuery UI and switch over to the "Schema" tab. This should tell you what features (column names) the model wants.
It is possible that when you created the TensorFlow/Keras model you did not assign names to the input nodes. Then, the feature names might have been auto-assigned to something like int1 and float2.
Alternately, run the program saved_model_cli on the model (it's a python program that comes with tensorflow) to see what the supported signature is
saved_model_cli show --dir $export_path --all
After some research, Auto ML encodes the input Tensors as Prensors which is a serialized string format shoved into a Tensor.
This means that you can't import the AutoML model from GCS directly into BQML the way you would import a TensorFlow model that explicitly encoded the different inputs as a json struct.
So, in order to import an AutoML model into BigQuery ML, the BigQuery engineering team would need to add support for something like model_type='automl' in addition to model_type='tensorflow'.
At the moment multi-column is not possible, from the AutoML Beginners guide:
One column from your dataset, called the target, is what your model will learn to predict. Some number of the other data columns are inputs (called features) that the model will learn patterns from. You can use the same input features to build multiple kinds of models just by changing the target.
Also found this feature request to Multi-target AutoML Tables Request

How to extract Tensorflow trained weights from graph.pbtxt to raw data

I have trained a custom neural network with the function:
tf.estimator.train_and_evaluate
After correct training, it contains the following files:
checkpoint
events.out.tfevents.1538489166.ti
model.ckpt-0.data-00000-of-00002
model.ckpt-0.index
model.ckpt-10.data-00000-of-00002
model.ckpt-10.index eval
graph.pbtxt
model.ckpt-0.data-00001-of-00002
model.ckpt-0.meta
model.ckpt-10.data-00001-of-00002
model.ckpt-10.meta
Now I need to export the weights and biases of every layer, into a raw data structure, e.g. an array, numpy.
I have read multiple pages on TensorFlow, and on other topics, but neither can find this question. The first thing I would assume to put the fils together into graph.pd with the freeze.py as suggested here:
Tensorflow: How to convert .meta, .data and .index model files into one graph.pb file
But then still the main question is unsolved.
If you wish to evaluate tensors alone, you can check out this question. But if you wish to e.g. deploy your network, you can take a look at TensorFlow serving, which is probably the most performant one right now. Or if you want to export this network to other frameworks and use them there, you can actually use ONNX for this purpose.
If saving weights and biases in a numpy array is your strict requirement, you can follow this example:
# In a TF shell, define all requirements and call the model function
y = model(x, is_training=False, reuse=tf.AUTO_REUSE) # For example
Once you call this function, you can see all the variables in the graph by running
tf.global_variables()
You need to restore all these variables from the latest checkpoint (say ckpt_dir) and then execute each of these variables to get the latest values.
checkpoint = tf.train.latest_checkpoint('./model_dir/')
fine_tune = tf.contrib.slim.assign_from_checkpoint_fn(checkpoint,
tf.global_variables(),
ignore_missing_vars=True)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gv = sess.run(tf.global_variables())
Now gv will be a list of all the values of your variables (weights and biases); You can access any individual component via indexing - gv[5] etc. Or you can convert the entire thing into an array and save using numpy.
np.save('my_weights', np.array(gv))
This will save all your weights and biases in your current working directory as a numpy array - 'my_weights.npy'.
Hope this helps.

Using saved model for prediction in tensorflow

I use this code to restore my model, but I don't know how to predict after restoring it, which function can I use? I'm a beginner in tensorflow, I have no idea to which parameters or function will be saved.
In the meta model:
sess = tf.Session()
saver = tf.train.import_meta_graph("/home/MachineLearning/model.ckpt.meta")
saver.restore(sess,tf.train.latest_checkpoint('./'))
print("Model restored with success ")
x_predict,y_predict= load_svmlight_file('/MachineLearning/to_predict.csv')
x_predict = x_valid.toarray()
sess.run([] ,feed_dict ) #i don't know how to use predict function
These are the results:
$python predict.py
Model restored with success
Traceback (most recent call last):
File "predict.py", line 23, in <module>
sess.run([] ,feed_dict )
NameError: name 'feed_dict' is not defined
You're almost there. Tensorflow is simply a math library. Your graph is a collection of math operations with the associated dependencies (e.g. a graph, DAG specifically).
When you loaded the graph and associated variables (weights) you loaded all the definitions. Now you need to ask tensorflow to compute some value in the graph. There are lots of values it could compute, the one you want is often named logits (a typical name for the output layer of a neural network). But note that it could be named anything (especially if this isn't a neural network model), you need to understand the model. You might also want to compute an operation named accuracy which is defined to compute the accuracy of a particular batch of inputs (again depends on your model).
Note that you will need to provide tensorflow with whatever it needs to perform these computations. There is generally a placeholder where you pass in your data (and during training a placeholder for your labels which you don't need for prediction because none of the operations you will ask tensorflow to compute depend on it).
But you will need to get references to these various operations (logits, and accuracy) and placeholders (x is a typical name). Since you loaded your graph from disk you don't have the references (note that an alternative way of loading the model is to re-run the code that builds the model, which gives you easy access to the references you need).
In order to get the right references you can look them up by name. Here's how you would get a list of all the operations:
List of tensor names in graph in Tensorflow
Then to get a specific OP (operation) by name:
How to get a tensorflow op by name?
So you'll have something like this:
logits = tf.get_default_graph().get_operation_by_name("logits:0")
x = tf.get_default_graph().get_operation_by_name("x:0")
accuracy = tf.get_default_graph().get_operation_by_name("accuracy:0")
Note that the :0 is an index added to all names in tensorflow to avoid duplicate names. Now you have all the references you need and you can use sess.run to perform a specific computation, providing the input data, and OPs you'd like to have computed:
sess.run([logits, accuracy], feed_dict={x:your_input_data_in_numpy_format})
The names of these elements will vary in your implementation, I've used the most common names. If they weren't given pretty names it'll be hard to identify them and you'll need to look through the original code that produced the graph. In fact if they weren't named properly looking them up by name is so painful that it's probably better to just re-run the code that produced the original graph rather than import the meta graph. Notice that saver.restore only restores the actual data, import_meta_graph is the optional piece which can be replaced by simply re-building the graph programmatically.

Tensorflow: Keras, Estimators and custom input function

TF1.4 made Keras an integral part.
When trying to create Estimators from Keras models with propratery input function (I.e., not using the tf.estimator.inputs.numpy_input_fn) things are not working as Tensorflow can not fuse the model with the Input function.
I am using tf.keras.estimator.model_to_estimator
keras_estimator = tf.keras.estimator.model_to_estimator(
keras_model = keras_model,
config = run_config)
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn,
max_steps=self.train_steps)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn,
steps=None)
tf.estimator.train_and_evaluate(keras_estimator, train_spec, eval_spec)
and I get the following error message:
Cannot find %s with name "%s" in Keras Model. It needs to match '
'one of the following:
I found some reference for this topic here (strangely enough its hidden in the TF docs in the master branch - compare to this)
If you have the same issue - see my answer below. Might save you several hours.
So here is the deal. You must make sure that your custom Input Function returns a dictionary of {inputs} and a dictionary of {outputs}.
The dictionary keys must match your Keras input/output layers name.
From TF docs:
First, recover the input name(s) of Keras model, so we can use them as the
feature column name(s) of the Estimator input function
This is correct.
Here is how I did this:
# Get inputs and outout Keras model name to fuse them into the infrastructure.
keras_input_names_list = keras_model.input_names
keras_target_names_list = keras_model.output_names
Now, that you have the names, you need to go to your own input function and change it so it will deliver two dictionaries with the corresponding input and output names.
In my example, before the change, the input function returned [image_batch],[label_batch]. This is basically a bug because it is stated that the inputfn returns a dictionary and not a list.
To solve this, we need to wrap it up into a dict:
image_batch_dict = dict(zip(keras_input_names_list , [image_batch]))
label_batch_dict = dict(zip(keras_target_names_list , [label_batch]))
Only now, TF will be able to connect the input function to the Keras input layers.

TensorFlow input pipeline for deployment on CloudML

I'm relatively new to TensorFlow and I'm having trouble modifying some of the examples to use batch/stream processing with input functions. More specifically, what is the 'best' way to modify this script to make it suitable for training and serving deployment on Google Cloud ML?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
Something akin to this example:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator/trainer
I can package it up and train it in the cloud, but I can't figure out how to apply even the simple vocab_processor transformations to an input tensor. I know how to do it with pandas, but there I can't apply the transformation to batches (using the chunk_size parameter). I would be very happy if I could reuse my pandas preprocessing pipelines in TensorFlow.
I think you have 3 options
1) You cannot reuse pandas preprocessing pipelines in TF. However, you could start TF with the output of your pandas preprocessing. So you could build a vocab and convert the text words to integers, and save a new preprocessed dataset to disk. Then read the integer data (which is encoding your text) in TF to do training.
2) You could build a vocab outside of TF in pandas. Then inside TF, after reading the words, you can make a table to map the text to integers. But if you are going to build a vocab outside of TF, you might as well do the transformation at the same time outside of TF, which is option 1.
3) Use tensorflow_transform. You can call tft.string_to_int() on the text column to automatically build the vocab and convert to integers. The output of tensorflow_transform is preprocessed data in tf.example format. Then training can start from the tf.example files. This is again option 1 but with tf.example files. If you want to run prediction on raw text data, this option allows you to make an exported graph that has the same text preprocessing built in, so you don't have to manage the preprocessing step at prediction time. However, this option is the most complicated as it introduces two additional ideas: tf.example files and beam pipelines.
For examples of tensorflow_transform see https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft
and
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/reddit_tft