how to reset the tf.estimator.Estimator parameters? - tensorflow

I tried tf.Graph() but can't get the variable to reset by new. The code is below:
with tf.Graph().as_default() as g:
clf_ = tf.estimator.Estimator(model_fn=my_w2d.model_fn_wide2deep, params=param, model_dir="/Users/zhouliaoming/data/credit_dnn/model_retrain/rm_gene_v2_sall/")
with tf.name_scope("rewrite"):
clf2 = tf.estimator.Estimator(model_fn=my_w2d.model_fn_wide2deep, params=param, model_dir="/Users/zhouliaoming/data/credit_dnn/model_retrain/genev2_s0/")
out_bias = tf.get_variable("output_0/bias")
out_b_rew = tf.get_variable("rewrite/output_0/bias")
vars_ = clf_.get_variable_names() ## only has clf_.get_variable_values()
print("vars: %r\n output_0/bias: %r\ntrain-vars: %r" % (vars_, clf_.get_variable_value('output_0/bias'), tf.contrib.framework.get_trainable_variables()))
print("before rewrite: out_bias: %r, out_b_rew: %r" % (out_bias.eval(), out_b_rew.eval()))
out_b_rew.assing(out_bias)
print("after rewrite: out_bias: %r, out_b_rew: %r" % (out_bias.eval(), out_b_rew.eval()))
and it just return error:
Traceback (most recent call last):
File "tf_utils.py", line 31, in <module>
out_bias = tf.get_variable("output_0/bias")
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1262, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1097, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 764, in _get_single_variable
"but instead was %s." % (name, shape))
ValueError: Shape of a new variable (output_0/bias) must be fully defined, but instead was <unknown>.
=============== old infomation cut line =========
I defined a tf.estimator.Estimator model A by model_fn handler.
I want to change model A's parameter by same old model's parameters as ckpt file.
I try to get model A's graph and then get the parameter's variable in Graph and then assigned it by my old model's parameter.
Hope some advices!
Thanks very much!

There are many ways of doing this, depending on exactly what you have available. For example, if you have the code and checkpoints from both models, you can create two separate graphs (with tf.Graph() as g) load the two checkpoints into them, read the variable values from one graph and assign it to a variable in another graph.
If you know exactly the variable you want to read in one checkpoint, you can restore just it (Saver.restore takes a list of variables to restore), or you can read it using tools like CheckpointReader

Related

running temporal fusion transformer default dataset shape error

I ran default code of Temporal fusion transformer in google colab which downloaded at github.
After clone, when I ran the step 2, there's no way to test training.
python3 -m script_train_fixed_params volatility outputs yes
The problem is shape error in the below.
Computing best validation loss
Computing test loss
/usr/local/lib/python3.7/dist-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
updates=self.state_updates,
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 239, in <module>
use_testing_mode=True) # Change to false to use original default params
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 156, in main
targets = data_formatter.format_predictions(output_map["targets"])
File "/content/drive/MyDrive/tft_tf2/data_formatters/volatility.py", line 183, in format_predictions
output[col] = self._target_scaler.inverse_transform(predictions[col])
File "/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py", line 1022, in inverse_transform
force_all_finite="allow-nan",
File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 773, in check_array
"if it contains a single sample.".format(array)
ValueError: Expected 2D array, got 1D array instead:
array=[-1.43120418 1.58885804 0.28558148 ... -1.50945972 -0.16713021
-0.57365613].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried to modify code which is predict dataframe shpae of 'data_formatters/volatility.py", line 183, in format_predictions' because I guessed that's where the problem arises.), but I can't handle that.
You have to change line
183 in volatitlity.py
output[col] = self._target_scaler.inverse_transform(predictions[col].values.reshape(-1, 1))
and line 216 in electricity.py
sliced_copy[col] = target_scaler.inverse_transform(sliced_copy[col].values.reshape(-1, 1))
Afterwards the example electricity works fine. And I guess this should be the same with volatility.

TFX Transform Rank Mismatch While Loading/Applying TFX Beam Transform Graph

I've already successfully fit a TFTransformOutput to some data (in this case, the Census dataset from UCI common amongst the TF and TFX examples.) I try to apply the transformer with the method transform_raw_features(raw_features) but keep getting the error:
ValueError: Node 'transform/transform/inputs/workclass_copy' has an
_output_shapes attribute inconsistent with the GraphDef for output #0: Shapes must be equal rank, but are 0 and 1
Digging into the source code, it seems the error originates in saved_transform_io in the method _partially_apply_saved_transform_impl while doing:
saver = tf_saver.import_meta_graph(meta_graph_def, import_scope=import_scope,
input_map=input_map)
I examined the meta_graph_def produced by TFX TFTransform and Beam and notice that the graph indeed has a series of copied variables with input/output rank differences. However, that is nothing I have control over.
The column in the error message is "workclass" which is a simple categorical column. What might I be doing incorrectly? What is the best way to debug this? At this point, I've already dug deep into the TF source code but the error seems to originate with how the TFTransform graph was written, not sure what levers I have to change/fix that.
This is using TF Transform v0.9 and the corresponding TF v1.9
Traceback (most recent call last): File
"/home/sahmed/workspace/ml_playground/TFX-TFT/trainers.py", line 449,
in parse_csv
transformed_stuff=xformer.transform_raw_features(raw_features) File
"/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/output_wrapper.py",
line 122, in transform_raw_features
self.transform_savedmodel_dir, raw_features)) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 360, in partially_apply_saved_transform_internal
saved_model_dir, logical_input_map, tensor_replacement_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 218, in _partially_apply_saved_transform_impl
input_map=input_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/training/saver.py",
line 1960, in import_meta_graph
**kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py",
line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py",
line 432, in new_func
return func(*args, **kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/importer.py",
line 422, in import_graph_def
raise ValueError(str(e)) ValueError: Node 'transform/transform/inputs/workclass_copy' has an _output_shapes
attribute inconsistent with the GraphDef for output #0: Shapes must be
equal rank, but are 0 and 1
The issue is likely that the shape of the workclass tensor is incompatible with what transform_raw_features expects.
TFTransformOutput.transform_raw_features() expects these features to have the same characteristics as described in the metadata given to tft.AnalyzeDataset() similarly to how it's done in this example:
https://github.com/tensorflow/transform/blob/master/examples/simple_example.py#L63
Could you take a look at the metadata used in your pipeline and see that it is compatible with the data fed into TFTransformOutput.transform_raw_features()?

UnicodeDecodeError from tf.train.import_meta_graph

I serialized a Tensorflow model with the following code ...
save_path = self.saver.save(self.session, os.path.join(self.logdir, "model.ckpt"), global_step)
logging.info("Model saved in file: %s" % save_path)
... and I'm now trying to restore it from scratch in a separate file using the following code:
saver = tf.train.import_meta_graph(PROJ_DIR + '/logs/default/model.ckpt-54.meta')
session = tf.Session()
saver.restore(session, PROJ_DIR + '/logs/default/model.ckpt-54')
print('Model restored')
When tf.train.import_meta_graph is called, the following exception is thrown:
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
Traceback (most recent call last):
File "/home/reid/projects/research/ccg/taggerflow_modified/test/tf_restore.py", line 4, in <module>
saver = tf.train.import_meta_graph(PROJ_DIR + '/logs/default/model.ckpt-54.meta')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1711, in import_meta_graph
read_meta_graph_file(meta_graph_or_file), clear_devices)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1563, in read_meta_graph_file
text_format.Merge(file_content.decode("utf-8"), meta_graph_def)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 1: invalid start byte
For reference, here's the first few lines of <PROJ_DIR>/logs/default/model.ckpt-54.meta:
<A7>:^R<A4>:
9
^CAdd^R^F
^Ax"^AT^R^F
^Ay"^AT^Z^F
^Az"^AT"^Z
^AT^R^Dtype:^O
^M2^K^S^A^B^D^F^E^C ^R^G
I think that Tensorflow is using a different encoding when serializing vs when deserializing. How do we specify the encoding that Tensorflow uses when serializing/deserializing? Or is the solution something different?
I was facing the same issue. Have you ensured that apart from the
.meta, .data-00000-of-00001 and the .index files
the file named 'checkpoint' too is there in the directory from which you're loading the model?
My issue got resolved after I made sure of this. Hope this helps!

Converting google-cloud-ml github Reddit example from regression to classification and adding keys?

I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs.
I've been able to get it running as per the tutorial readme.
However what i want to use it for is a binary classification problem and also output keys in batch prediction.
So i have made copy of the tutorial code here and have changed it in a few places to be able to have a model type of deep_classifier that would use a DNNClasifier instead of a DNNRegressor.
I've changed the score variable to be
if(score>0,1,0) as score
It's training fine, deploys to cloud ml but i'm not sure how to now get keys back from my predictions. `
I've updated the sql pulling from BigQuery to include id as example_id here
It seems the code from the tutorial had some sort of placeholder for example_id so i'm trying to leverage that.
It all seems to work but when i get batch predictions all i get is json like this:
{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]}
{"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]}
...
So example_id does not seem to be making it into the serving functions like i need.
I've tried to follow the approach here which is based on adapting the census example for keys.
I just cant figure out how to finish adapting this reddit example to also output keys in the predictions as they look a bit different to me in terms of design and functions being used.
Update 1
My latest attempt is here Trying to use the approach outlined here.
However this is giving errors:
NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]
Update 2
My latest attempt and details are here.
I'm now getting a error from tensorflow-fransform (run_preprocess.sh works fine in tft 0.1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__
self._dtype = tf.as_dtype(state['dtype'])
TypeError: string indices must be integers, not str
Update 3
I have changed things to just use beam + csv and avoid tft. Also i'm now using the approach as outlined here for extending the canned estimator to get the key back with the predictions.
However when following this post to try get the comments in as features i'm now running into a new error.
The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name
My repo for this attempt/approach is here. This all runs fine if i just use subreddit as a feature, it's adding in the comment feature that seems to be causing the problems. Lines 103 to 111 is where i have followed this approach.
Not sure what's triggering the error in my code from reading the trace. Anyone any ideas?
Or can anyone point me towards another approach to go from text to bow to embedding feature in TF?
See:
https://medium.com/#lakshmanok/how-to-extend-a-canned-tensorflow-estimator-to-add-more-evaluation-metrics-and-to-pass-through-ddf66cd3047d
Here's what the code looks like to pass through keys:
def forward_key_to_export(estimator):
estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)
## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
config = estimator.config
def model_fn2(features, labels, mode):
estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
if estimatorSpec.export_outputs:
for ekey in ['predict', 'serving_default']:
estimatorSpec.export_outputs[ekey] = \
tf.estimator.export.PredictOutput(estimatorSpec.predictions)
return estimatorSpec
return tf.estimator.Estimator(model_fn=model_fn2, config=config)
##
# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
estimator = tf.estimator.DNNLinearCombinedRegressor(...)
estimator = forward_key_to_export(estimator)
...
tf.estimator.train_and_evaluate(estimator, ...)
We have plans, but haven't moved the changes into Census yet for the output keys. In the mean time can you please see if this gist helps https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py
Please feel free to send a PR if you get to it sooner and we will merge your contribution.

Using graph_metrics.py with a saved graph

I want to view statistics of my model by saving my graph to a file then running graph_metrics.py.
I have tried a few different things to write the file, my best effort is:
tf.train.write_graph( session.graph_def, ".", "my_graph", as_text=True )
But here's what happens:
$ python ./util/graph_metrics.py --noinput_binary --graph my_graph
Traceback (most recent call last):
File "./util/graph_metrics.py", line 137, in <module>
tf.app.run()
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "./util/graph_metrics.py", line 85, in main
FLAGS.batch_size)
File "./util/graph_metrics.py", line 109, in calculate_graph_metrics
input_tensor = sess.graph.get_tensor_by_name(input_layer)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2531, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2385, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2427, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'Mul:0' refers to a Tensor which does not exist. The operation, 'Mul', does not exist in the graph."
Is there a complete working example of saving a graph, then analyzing it with graph_metrics.py?
This process seems to involve a magic incantation that I haven't yet discovered.
The error you're hitting is because you need to specify the name of your own input node with --input_layer= (it just defaults to Mul:0 because that's what we use in one of our Inception models):
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/graph_metrics.py#L51
The graph_metrics script is still very much a work in progress unfortunately, and you may hit problems with shape inference, but hopefully this should get you past the initial hurdle.